Parsing and all that
๐ Abstract
The article discusses parsing techniques, specifically focusing on deterministic parsing algorithms such as LL and LR parsing. It covers topics like push-down automata, context-free grammars, derivations, and parse trees. The article then delves into the details of LL and LR parsing, including their expressive power, table construction, and implementation through recursive descent and recursive ascent parsing.
๐ Q&A
[01] Parsing and all that
1. What is the purpose of this article? The article aims to provide an in-depth exploration of parsing techniques, particularly focusing on deterministic parsing algorithms like LL and LR parsing.
2. What is the relationship between push-down automata and context-free grammars? Push-down automata (PDAs) are automata with a stack that can remember things. They can be used to recognize the language of words that start with zeroes, followed by an equal number of ones, which can be described by a context-free grammar.
3. What is the difference between leftmost and rightmost derivations? Leftmost and rightmost derivations refer to the order in which grammar rules are applied during the derivation process. Leftmost derivation always applies the rule to the leftmost sort, while rightmost derivation applies the rule to the rightmost sort. This order matters for things like semantic actions and ease of implementation.
4. What is the difference between parse trees and abstract syntax trees? Parse trees, or concrete syntax trees, contain all the rule applications as seen in the grammar. Abstract syntax trees abstract over some parts of the syntax tree, such as leaving out whitespace or parentheticals, to capture the essential structure.
5. What is the significance of ambiguous grammars? If a grammar can produce multiple distinct derivations that result in the same sentence, the grammar is considered ambiguous. Ambiguous grammars can lead to issues with the order of semantic actions and the predictability of the parser.
[02] Topdown, (Strong) LL parsing
1. What is the difference between LL(1) and LL(k) grammars? LL(1) grammars only need to look ahead 1 symbol to decide which grammar rule to apply, while LL(k) grammars may need to look ahead k symbols. LL(1) grammars are always strong LL grammars, meaning they only need the combination of the sort to be parsed and the next symbol(s) to decide on a unique grammar rule to apply.
2. How can strong LL grammars be constructed from non-strong LL grammars? By marking the position in a rule with a symbol, a non-strong LL grammar can be transformed into an equivalent strong LL grammar, although this may result in a larger grammar and parse table.
3. What are the limitations of LL parsing in terms of expressive power? LL(k) grammars do not capture all deterministic context-free languages, as there are languages that cannot be expressed by any LL(k) grammar for any k.
[03] Bottomup, LR parsing
1. How does LR parsing differ from LL parsing in terms of derivation order? LR parsing uses a left-to-right, rightmost derivation in reverse, while LL parsing uses a left-to-right, leftmost derivation. This means LR parsing decides which rule to apply after consuming the entire body of the rule, while LL parsing decides before consuming the input for that rule.
2. What is the relationship between LL and LR in terms of expressive power? The set of all LL(k) languages for any k is a strict subset of all LR(1) languages. Any LL(k) grammar is also an LR(k) grammar, but the reverse is not necessarily true.
3. How does the LR automaton differ from the LL automaton? The LR automaton explores all possible rules simultaneously, using epsilon transitions, rather than observing each rule application like the LL automaton. This allows the LR automaton to make decisions about which rule to reduce based on the look-ahead, rather than just the current sort and input.
[04] Recursive Ascent
1. How does recursive ascent parsing differ from recursive descent parsing? In recursive ascent parsing, the states of the LR automaton are represented as functions, which either shift input, reduce by a rule, or call another state function. This is in contrast to recursive descent parsing, where the grammar rules and sorts are represented as functions.
2. What is the purpose of the "returns" in the recursive ascent code? The "returns" in the recursive ascent code keep track of how many symbols need to be popped off the stack when a reduction is performed, so that the parser can return to the appropriate previous state.
3. How does the recursive ascent-descent approach combine LL and LR parsing? The recursive ascent-descent approach uses the LR automaton to filter the input and select the appropriate rule, and then switches to LL parsing to finish parsing the rule body, taking advantage of the strengths of both techniques.