Compiling Prism’s AST

One important class of consumers of Prism’s AST is compilers. Currently CRuby, JRuby, TruffleRuby, and Natalie have all built compilation code on top of Prism’s AST.

This document will describe, at a high level, how CRuby’s compilation of Prism’s AST works.

As described in the build system documentation, there is a “push” Webhook set up within the Prism repo triggered on each new commit to send information about the commit to git.ruby-lang.org. This in turn runs a script to sync over new changes in Prism to their corresponding files in Ruby. Any failures in this sync script will show alerts in the alerts-sync channel in the RubyLang Slack. The result of this step is that files are synced from Prism into ruby/ruby for its use. It is also worth noting that {common.mk} contains a list of Prism files which it needs to correctly compile. If there are new Prism files added, this file should also be updated.

ruby/ruby uses the Prism code to generate an AST from which it can generate instruction sequences. Compilation in ruby/ruby has three main steps:

  1. Compute an AST

Syncing over the Prism code allows ruby/ruby to compute the AST using Prism. It currently does this within {iseq.c} using the pm_parser_init function.

  1. Run a first pass of compilation

Once the AST has been created, it is recursively descended in order to compute the appropriate instruction sequences. This is the crux of compilation, and we go into more detail about nuances in the following paragraphs.

The code for this step is almost exclusively in {prism_compile.c}. The main function used for compilation is pm_compile_node which is essentially a huge switch statement over practically every node type which computes the appropriate instruction sequences for that node type. There are several convenience helpers, such as PM_COMPILE, PM_COMPILE_POPPED, PM_COMPILE_NOT_POPPED which all call into the pm_compile_node function.

There are also several functions, like parse_string, parse_integer which consume Prism nodes and return CRuby values. These are all called for their relevant types within the big switch statement.

The Prism compiler also uses a concept of “scope nodes” which are not standard Prism nodes in the AST, but instead nodes constructed within the compiler for the sole purpose of making compilation easier. Scope nodes are defined in {prism_compile.h} and store information such as locals, local table size, local depth offset and the index lookup tables. Scope nodes can be generated for node types which have their own “scope”.

  1. Run an optimization pass of compilation

After the instruction sequences are initially computed, there is an existing (non-Prism based) optimization pass of the instruction sequences. There are several optimizations currently inlined into step 2, however, most of them happen in this step. Specifically, any peephole optimizations happen in this step. By the end of step 2, however, the instruction sequences take the same form regardless of if the initial AST was generated by Prism or not. Therefore, step 3 is agnostic to the parser, and should not require any Prism specific code.