Posts made in November 2023

Filters:       

Packrat Parsing and Parsing Expression Grammars

Bobby posted in Computer Science | Nov 24, 2023

One of the most important features of the Parsing Expression Grammar formalism is that it is packrat parsable. This means that it can be parsed in linear time using a technique called memoization. This technique is also known as tabling in the logic programming community. The basic idea of a Parsing Expression Grammar or PEG is that you have a DSL or a domain specific language which looks exactly like a BNF, except that it is a program and its a parser. Now, these PEGs can be ambiguous and are capable of backtracking. Packrat parsers make the backtracking efficient with caching. How this works, is that the parser remembers the results of all the sub-parsers it has already run, and if it encounters the same sub-parser again, it just returns the result from the previous run. This behaviour makes packrat parsing is a very powerful technique.

When you design a language you typically want to formalize the syntax with a context-free grammar and then you feed it through a compiler and it'll generate a table-driven, bottom-up parser. Then you may have to hack on the grammar until you get it right because in order for the parsers to be efficient they need to be able to look ahead just one symbol so that they know what choice to make as they typically don't support backtracking. Of course there are versions which support infinite backtracking, but that can make your parser very inefficient. So, usually what you try to do in order to get a very efficient parser is you try and turn it into a nice grammar and then you use the generated parser to do what you like.

Continue Reading | 0 Comments Computer Science 💻 Programming Automata Theory