/usr/share/doc/ruby-treetop/manual-html/semantic_interpretation.html is in ruby-treetop 1.4.10-5.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | <html><head><link href="./screen.css" rel="stylesheet" type="text/css" />
<script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
</script>
<script type="text/javascript">
_uacct = "UA-3418876-1";
urchinTracker();
</script>
</head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="contribute.html">Contribute</a></li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="main_content"><div id="secondary_navigation"><ul><li><a href="syntactic_recognition.html">Syntax</a></li><li>Semantics</li><li><a href="using_in_ruby.html">Using In Ruby</a></li><li><a href="pitfalls_and_advanced_techniques.html">Advanced Techniques</a></li></ul></div><div id="documentation_content"><h1>Semantic Interpretation</h1>
<p>Lets use the below grammar as an example. It describes parentheses wrapping a single character to an arbitrary depth.</p>
<pre><code>grammar ParenLanguage
rule parenthesized_letter
'(' parenthesized_letter ')'
/
[a-z]
end
end
</code></pre>
<p>Matches:</p>
<ul>
<li><code>'a'</code></li>
<li><code>'(a)'</code></li>
<li><code>'((a))'</code></li>
<li>etc.</li>
</ul>
<p>Output from a parser for this grammar looks like this:</p>
<p><img src="./images/paren_language_output.png" alt="Tree Returned By ParenLanguageParser" /></p>
<p>This is a parse tree whose nodes are instances of <code>Treetop::Runtime::SyntaxNode</code>. What if we could define methods on these node objects? We would then have an object-oriented program whose structure corresponded to the structure of our language. Treetop provides two techniques for doing just this.</p>
<h2>Associating Methods with Node-Instantiating Expressions</h2>
<p>Sequences and all types of terminals are node-instantiating expressions. When they match, they create instances of <code>Treetop::Runtime::SyntaxNode</code>. Methods can be added to these nodes in the following ways:</p>
<h3>Inline Method Definition</h3>
<p>Methods can be added to the nodes instantiated by the successful match of an expression</p>
<pre><code>grammar ParenLanguage
rule parenthesized_letter
'(' parenthesized_letter ')' {
def depth
parenthesized_letter.depth + 1
end
}
/
[a-z] {
def depth
0
end
}
end
end
</code></pre>
<p>Note that each alternative expression is followed by a block containing a method definition. A <code>depth</code> method is defined on both expressions. The recursive <code>depth</code> method defined in the block following the first expression determines the depth of the nested parentheses and adds one to it. The base case is implemented in the block following the second expression; a single character has a depth of 0.</p>
<h3>Custom <code>SyntaxNode</code> Subclass Declarations</h3>
<p>You can instruct the parser to instantiate a custom subclass of Treetop::Runtime::SyntaxNode for an expression by following it by the name of that class enclosed in angle brackets (<code><></code>). The above inline method definitions could have been moved out into a single class like so.</p>
<pre><code># in .treetop file
grammar ParenLanguage
rule parenthesized_letter
'(' parenthesized_letter ')' <ParenNode>
/
[a-z] <ParenNode>
end
end
# in separate .rb file
class ParenNode < Treetop::Runtime::SyntaxNode
def depth
if nonterminal?
parenthesized_letter.depth + 1
else
0
end
end
end
</code></pre>
<h2>Automatic Extension of Results</h2>
<p>Nonterminal and ordered choice expressions do not instantiate new nodes, but rather pass through nodes that are instantiated by other expressions. They can extend nodes they propagate with anonymous or declared modules, using similar constructs used with expressions that instantiate their own syntax nodes.</p>
<h3>Extending a Propagated Node with an Anonymous Module</h3>
<pre><code>rule parenthesized_letter
('(' parenthesized_letter ')' / [a-z]) {
def depth
if nonterminal?
parenthesized_letter.depth + 1
else
0
end
end
}
end
</code></pre>
<p>The parenthesized choice above can result in a node matching either of the two choices. The node will be extended with methods defined in the subsequent block. Note that a choice must always be parenthesized to be associated with a following block, otherwise the block will apply to just the last alternative.</p>
<h3>Extending A Propagated Node with a Declared Module</h3>
<pre><code># in .treetop file
rule parenthesized_letter
('(' parenthesized_letter ')' / [a-z]) <ParenNode>
end
# in separate .rb file
module ParenNode
def depth
if nonterminal?
parenthesized_letter.depth + 1
else
0
end
end
end
</code></pre>
<p>Here the result is extended with the <code>ParenNode</code> module. Note the previous example for node-instantiating expressions, the constant in the declaration must be a module because the result is extended with it.</p>
<h2>Automatically-Defined Element Accessor Methods</h2>
<h3>Default Accessors</h3>
<p>Nodes instantiated upon the matching of sequences have methods automatically defined for any nonterminals in the sequence.</p>
<pre><code>rule abc
a b c {
def to_s
a.to_s + b.to_s + c.to_s
end
}
end
</code></pre>
<p>In the above code, the <code>to_s</code> method calls automatically-defined element accessors for the nodes returned by parsing nonterminals <code>a</code>, <code>b</code>, and <code>c</code>.</p>
<h3>Labels</h3>
<p>Subexpressions can be given an explicit label to have an element accessor method defined for them. This is useful in cases of ambiguity between two references to the same nonterminal or when you need to access an unnamed subexpression.</p>
<pre><code>rule labels
first_letter:[a-z] rest_letters:(', ' letter:[a-z])* {
def letters
[first_letter] + rest_letters.elements.map do |comma_and_letter|
comma_and_letter.letter
end
end
}
end
</code></pre>
<p>The above grammar uses label-derived accessors to determine the letters in a comma-delimited list of letters. The labeled expressions <em>could</em> have been extracted to their own rules, but if they aren't used elsewhere, labels still enable them to be referenced by a name within the expression's methods.</p>
<h3>Overriding Element Accessors</h3>
<p>The module containing automatically defined element accessor methods is an ancestor of the module in which you define your own methods, meaning you can override them with access to the <code>super</code> keyword. Here's an example of how this fact can improve the readability of the example above.</p>
<pre><code>rule labels
first_letter:[a-z] rest_letters:(', ' letter:[a-z])* {
def letters
[first_letter] + rest_letters
end
def rest_letters
super.elements.map { |comma_and_letter| comma_and_letter.letter }
end
}
end
</code></pre>
<h2>Methods Available on <code>Treetop::Runtime::SyntaxNode</code></h2>
<table>
<tr>
<td>
<code>terminal?</code>
</td>
<td>
Was this node produced by the matching of a terminal symbol?
</td>
</tr>
<tr>
<td>
<code>nonterminal?</code>
</td>
<td>
Was this node produced by the matching of a nonterminal symbol?
</td>
<tr>
<td>
<code>text_value</code>
</td>
<td>
The substring of the input represented by this node.
</td>
<tr>
<td>
<code>elements</code>
</td>
<td>
Available only on nonterminal nodes, returns the nodes parsed by the elements of the matched sequence.
</td>
<tr>
<td>
<code>input</code>
</td>
<td>
The entire input string, which is useful mainly in conjunction with <code>interval</code>
</td>
<tr>
<td>
<code>interval</code>
</td>
<td>
The Range of characters in <code>input</code> matched by this rule
</td>
<tr>
<td>
<code>empty?</code>
</td>
<td>
returns true if this rule matched no characters of input
</td>
<tr>
<td>
<code>inspect</code>
</td>
<td>
Handy-dandy method that returns an indented subtree dump of the syntax tree starting here.
This dump includes, for each node, the offset and a snippet of the text this rule matched, and the names of mixin modules and the accessor and extension methods.
</td>
</tr>
</table>
</div></div></div><div id="bottom"></div></body></html>
|