LR Parsing of Permutation Phrases
Jana Kostičová
TL;DR
The paper addresses efficient LR parsing of permutation phrases in CFGs by introducing CFG permutation grammars (CFGP) and an expanded grammar that semantically expands permutation phrases. It develops a modified LR parsing algorithm that uses set-based tracking for permutation content, reducing the number of LR(0) states from factorial to exponential scales in practical cases and extending to LR(1) with lookahead. Central contributions include the PERM_CLOSURE and PERM_GOTO constructs, a formal mapping between the original and expanded automata, and a complexity-based analysis showing large state reductions for independent permutation rules. The approach is demonstrated on JSON-like grammars, discusses limitations (no nesting of simple phrases or optional elements within a permutation phrase), and outlines directions for extending the method to more complex permutation constructs.
Abstract
This paper presents an efficient method for LR parsing of permutation phrases. In practical cases, the proposed algorithm constructs an LR(0) automaton that requires significantly fewer states to process a permutation phrase compared to the standard construction. For most real-world grammars, the number of states is typically reduced from $Ω(n!)$ to $O(2^{n})$, resulting in a much more compact parsing table. The state reduction increases with longer permutation phrases and a higher number of permutation phrases within the right-hand side of a rule. We demonstrate the effectiveness of this method through its application to parsing a JSON document.
