Parsing
Bootstrapping a self parsing parser is neither fun nor profitable. Let's begin!
In order for the parser to self-parse it needs to recreate itself from a source file represeting its own syntax.
Since the parser doesn't exist yet we can use whatever syntax we want, go nuts!
We'll then need to imagine being the parser and constructing the machinery necessary to parse the initial version of the source.
This part is a little tricky since there is no defined behavior or specification. It's good to keep things flowing freely and changing the syntax where that makes sense and implementing the parsing machinery where possible. Don't be afraid to re-organize big pieces. It will feel like you're not making any progress since nothing will run or execute for quite a while.
At some point you'll have constructed enough of the parsing machinery to take a stab at running the thing. This will force you to figure out IO. You'll need an internal reperesentation of the parsed source, an AST. It makes sense to use the AST to drive the parsing machinery so it ends up being a tiny specialized LISP-like language.
Now the parts start to condense a little clearer. The program consumes the syntax source with the configured machinery to produce the AST. The AST can be attached to the machinery to create a parser. The AST + machinery can be printed as JS source for persisting. This way our meta-parser can be configured to create a whole collection of simple languages.
In order to print out the source it requires a certain amount of introspection. There is a tradeoff between introspection and performance. When the parser program runs some things are pre-compiled for significant performance improvements. With careful construction they may also be introspectable.
The most challenging aspect is keeping the structures clear in their forms. The
parser source is simple enough, but it then needs to be transformed into an AST.
The AST is embedded inside a JavaScript program as JSON. This can cause some
headaches with matching escapes. A JavaScript program source can be a string of
text inside another JavaScript program to be evaluated with Function()
or it
can be placed inside a <script>
tag to be evaluated at some future point.
The evaluated function can also be returned to text with function.toSource
.
Again, keeping track of exactly what is being passed around and what it is
embedded in can be challenging.