This file is indexed.

/usr/share/doc/ruby-treetop/manual-html/syntactic_recognition.html is in ruby-treetop 1.4.10-5.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
<html><head><link href="./screen.css" rel="stylesheet" type="text/css" />
          <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
          </script>
          <script type="text/javascript">
          _uacct = "UA-3418876-1";
          urchinTracker();
          </script>
        </head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="contribute.html">Contribute</a></li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="main_content"><div id="secondary_navigation"><ul><li>Syntax</li><li><a href="semantic_interpretation.html">Semantics</a></li><li><a href="using_in_ruby.html">Using In Ruby</a></li><li><a href="pitfalls_and_advanced_techniques.html">Advanced Techniques</a></li></ul></div><div id="documentation_content"><h1>Syntactic Recognition</h1>

<p>Treetop grammars are written in a custom language based on parsing expression grammars. Literature on the subject of <a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar">parsing expression grammars</a> (PEGs) is useful in writing Treetop grammars.</p>

<p>PEGs have no separate lexical analyser (since the algorithm has the same time-complexity guarantees as the best lexical analysers) so all whitespace and other lexical niceties (like comments) must be explicitly handled in the grammar. A further benefit is that multiple PEG grammars may be seamlessly composed into a single parser.</p>

<h1>Grammar Structure</h1>

<p>Treetop grammars look like this:</p>

<pre><code>require "my_stuff"

grammar GrammarName
  include Module::Submodule

  rule rule_name
    ...
  end

  rule rule_name
    ...
  end

  ...
end
</code></pre>

<p>The main keywords are:</p>

<ul>
<li><p><code>grammar</code> : This introduces a new grammar. It is followed by a constant name to which the grammar will be bound when it is loaded.</p></li>
<li><p><code>include</code>: This causes the generated parser to include the referenced Ruby module (which may be another parser)</p></li>
<li><p><code>require</code>: This must be at the start of the file, and is passed through to the emitted Ruby grammar</p></li>
<li><p><code>rule</code> : This defines a parsing rule within the grammar. It is followed by a name by which this rule can be referenced within other rules. It is then followed by a parsing expression defining the rule.</p></li>
</ul>


<p>A grammar may be surrounded by one or more nested <code>module</code> statements, which provides a namespace for the generated Ruby parser.</p>

<p>Treetop will emit a module called <code>GrammarName</code> and a parser class called <code>GrammarNameParser</code> (in the module namespace, if specified).</p>

<h1>Parsing Expressions</h1>

<p>Each rule associates a name with a <em>parsing expression</em>. Parsing expressions are a generalization of vanilla regular expressions. Their key feature is the ability to reference other expressions in the grammar by name.</p>

<h2>Terminal Symbols</h2>

<h3>Strings</h3>

<p>Strings are surrounded in double or single quotes and must be matched exactly.</p>

<ul>
<li><code>"foo"</code></li>
<li><code>'foo'</code></li>
</ul>


<h3>Character Classes</h3>

<p>Character classes are surrounded by brackets. Their semantics are identical to those used in Ruby's regular expressions.</p>

<ul>
<li><code>[a-zA-Z]</code></li>
<li><code>[0-9]</code></li>
</ul>


<h3>The Anything Symbol</h3>

<p>The anything symbol is represented by a dot (<code>.</code>) and matches any single character.</p>

<h3>Ellipsis</h3>

<p>An empty string matches at any position and consumes no input. It's useful when you wish to treat a single symbol as part of a sequence, for example when an alternate rule will be processed using shared code.</p>

<pre>
    rule alts
      ( foo bar / baz '' )
      {
        def value
          elements.map{|e| e.text_value }
        end
      }
    end
</pre>


<h2>Nonterminal Symbols</h2>

<p>Nonterminal symbols are unquoted references to other named rules. They are equivalent to an inline substitution of the named expression.</p>

<pre><code>rule foo
  "the dog " bar
end

rule bar
  "jumped"
end
</code></pre>

<p>The above grammar is equivalent to:</p>

<pre><code>rule foo
  "the dog jumped"
end
</code></pre>

<h2>Ordered Choice</h2>

<p>Parsers attempt to match ordered choices in left-to-right order, and stop after the first successful match.</p>

<pre><code>"foobar" / "foo" / "bar"
</code></pre>

<p>Note that if <code>"foo"</code> in the above expression came first, <code>"foobar"</code> would never be matched.
Note also that the above rule will match <code>"bar"</code> as a prefix of <code>"barbie"</code>.
Care is required when it's desired to match language keywords exactly.</p>

<h2>Sequences</h2>

<p>Sequences are a space-separated list of parsing expressions. They have higher precedence than choices, so choices must be parenthesized to be used as the elements of a sequence.</p>

<pre><code>"foo" "bar" ("baz" / "bop")
</code></pre>

<h2>Zero or More</h2>

<p>Parsers will greedily match an expression zero or more times if it is followed by the star (<code>*</code>) symbol.</p>

<ul>
<li><code>'foo'*</code> matches the empty string, <code>"foo"</code>, <code>"foofoo"</code>, etc.</li>
</ul>


<h2>One or More</h2>

<p>Parsers will greedily match an expression one or more times if it is followed by the plus (<code>+</code>) symbol.</p>

<ul>
<li><code>'foo'+</code> does not match the empty string, but matches <code>"foo"</code>, <code>"foofoo"</code>, etc.</li>
</ul>


<h2>Optional Expressions</h2>

<p>An expression can be declared optional by following it with a question mark (<code>?</code>).</p>

<ul>
<li><code>'foo'?</code> matches <code>"foo"</code> or the empty string.</li>
</ul>


<h2>Repetition count</h2>

<p>A generalised repetition count (minimum, maximum) is also available.</p>

<ul>
<li><code>'foo' 2..</code> matches <code>'foo'</code> two or more times</li>
<li><code>'foo' 3..5</code> matches <code>'foo'</code> from three to five times</li>
<li><code>'foo' ..4</code> matches <code>'foo'</code> from zero to four times</li>
</ul>


<h2>Lookahead Assertions</h2>

<p>Lookahead assertions can be used to make parsing expressions context-sensitive.
The parser will look ahead into the buffer and attempt to match an expression without consuming input.</p>

<h3>Positive Lookahead Assertion</h3>

<p>Preceding an expression with an ampersand <code>(&amp;)</code> indicates that it must match, but no input will be consumed in the process of determining whether this is true.</p>

<ul>
<li><code>"foo" &amp;"bar"</code> matches <code>"foobar"</code> but only consumes up to the end <code>"foo"</code>. It will not match <code>"foobaz"</code>.</li>
</ul>


<h3>Negative Lookahead Assertion</h3>

<p>Preceding an expression with a bang <code>(!)</code> indicates that the expression must not match, but no input will be consumed in the process of determining whether this is true.</p>

<ul>
<li><code>"foo" !"bar"</code> matches <code>"foobaz"</code> but only consumes up to the end <code>"foo"</code>. It will not match <code>"foobar"</code>.</li>
</ul>


<p>Note that a lookahead assertion may be used on any rule, not just a string terminal.</p>

<pre><code>rule things
  thing (!(disallowed / ',') following)*
end
</code></pre>

<p>Here's a common use case:</p>

<pre><code>rule word
  [a-zA-Z]+
end

rule conjunction
  primitive ('and' ' '+ primitive)*
end

rule primitive
  (!'and' word ' '+)*
end
</code></pre>

<p>Here's the easiest way to handle C-style comments:</p>

<pre><code>rule c_comment
  '/*'
  (
    !'*/'
    (. / "\n")
  )*
  '*/'
end
</code></pre>

<h2>Semantic predicates (positive and negative)</h2>

<p>Sometimes you must execute Ruby code during parsing in order to decide how to proceed.
This is an advanced feature, and must be used with great care, because it can change the
way a Treetop parser backtracks in a way that breaks the parsing algorithm. See the
notes below on how to use this feature safely.</p>

<p>The code block is the body of a Ruby lambda block, and should return true or false, to cause this
parse rule to continue or fail (for positive sempreds), fail or continue (for negative sempreds).</p>

<ul>
<li><code>&amp;{ ... }</code> Evaluate the block and fail this rule if the result is false or nil</li>
<li><code>!{ ... }</code> Evaluate the block and fail this rule if the result is not false or nil</li>
</ul>


<p>The lambda is passed a single argument which is the array of syntax nodes matched so far in the
current sequence. Note that because the current rule has not yet succeeded, no syntax node has
yet been constructed, and so the lambda block is being run in a context where the <code>names</code> of the
preceding rules (or as assigned by labels) are not available to access the sub-rules.</p>

<pre><code>rule id
  [a-zA-Z][a-zA-Z0-9]*
  {
    def is_reserved
      ReservedSymbols.include? text_value
    end
  }
end

rule foo_rule
  foo id &amp;{|seq| seq[1].is_reserved } baz
end
</code></pre>

<p>Match "foo id baz" only if <code>id.is_reserved</code>. Note that <code>id</code> cannot be referenced by name from <code>foo_rule</code>,
since that rule has not yet succeeded, but <code>id</code> has completed and so its added methods are available.</p>

<pre><code>rule test_it
  foo bar &amp;{|s| debugger; true } baz
end
</code></pre>

<p>Match <code>foo</code> then <code>bar</code>, stop to enter the debugger (make sure you have said <code>require "ruby-debug"</code> somewhere),
then continue by trying to match <code>baz</code>.</p>

<p>Treetop, like other PEG parsers, achieves its performance guarantee by remembering which rules it has
tried at which locations in the input, and what the result was. This process, called memoization,
requires that the rule would produce the same result (if run again) as it produced the first time when
the result was remembered. If you violate this principle in your semantic predicates, be prepared to
fight Cerberus before you're allowed out of Hades again.</p></div></div></div><div id="bottom"></div></body></html>