Design

There are three overall goals for this project:

The design of the parser is based around these main goals.

Structure

The first piece to understand about the parser is the design of its syntax tree. This is documented in config.yml. Every token and node is defined in that file, along with comments about where they are found in what kinds of syntax. This file is used to template out a lot of different files, all found in the templates directory. The templates/template.rb script performs the templating and outputs all files matching the directory structure found in the templates directory.

The templated files contain all of the code required to allocate and initialize nodes, pretty print nodes, and serialize nodes. This means for the most part, you will only need to then hook up the parser to call the templated functions to create the nodes in the correct position. That means editing the parser itself, which is housed in prism.c.

Pratt parsing

In order to provide the best possible error tolerance, the parser is hand-written. It is structured using Pratt parsing, a technique developed by Vaughan Pratt back in the 1970s. Below are a bunch of links to articles and papers that explain Pratt parsing in more detail.

You can find most of the functions that correspond to constructs in the Pratt parsing algorithm in prism.c. As a couple of examples:

Portability

In order to enable using this parser in other projects, the parser is written in C99, and uses only the standard library. This means it can be embedded in most any other project without having to link against CRuby. It can be used directly through its C API to access individual fields, or it can used to parse a syntax tree and then serialize it to a single blob. For more information on serialization, see the docs/serialization.md file.

Error tolerance

The design of the error tolerance of this parser is still very much in flux. We are experimenting with various approaches as the parser is being developed to try to determine the best approach. Below are a bunch of links to articles and papers that explain error tolerance in more detail, as well as document some of the approaches that we’re evaluating.

Currently, there are a couple of mechanisms for error tolerance that are in place: