Compiling Prism’s AST¶ ↑
One important class of consumers of Prism’s AST is compilers. Currently CRuby, JRuby, TruffleRuby, and Natalie have all built compilation code on top of Prism’s AST.
This document will describe, at a high level, how CRuby’s compilation of Prism’s AST works.
As described in the build system documentation, there is a “push” Webhook set up within the Prism
repo triggered on each new commit to send information about the commit to git.ruby-lang.org. This in turn runs a script to sync over new changes in Prism
to their corresponding files in Ruby. Any failures in this sync script will show alerts in the alerts-sync channel in the RubyLang Slack. The result of this step is that files are synced from Prism
into ruby/ruby for its use. It is also worth noting that {common.mk
} contains a list of Prism
files which it needs to correctly compile. If there are new Prism
files added, this file should also be updated.
ruby/ruby uses the Prism
code to generate an AST from which it can generate instruction sequences. Compilation in ruby/ruby has three main steps:
-
Compute an AST
Syncing over the Prism
code allows ruby/ruby to compute the AST using Prism
. It currently does this within {iseq.c
} using the pm_parser_init
function.
-
Run a first pass of compilation
Once the AST has been created, it is recursively descended in order to compute the appropriate instruction sequences. This is the crux of compilation, and we go into more detail about nuances in the following paragraphs.
The code for this step is almost exclusively in {prism_compile.c
}. The main function used for compilation is pm_compile_node
which is essentially a huge switch statement over practically every node type which computes the appropriate instruction sequences for that node type. There are several convenience helpers, such as PM_COMPILE
, PM_COMPILE_POPPED
, PM_COMPILE_NOT_POPPED
which all call into the pm_compile_node
function.
There are also several functions, like parse_string
, parse_integer
which consume Prism
nodes and return CRuby values. These are all called for their relevant types within the big switch statement.
The Prism
compiler also uses a concept of “scope nodes” which are not standard Prism
nodes in the AST, but instead nodes constructed within the compiler for the sole purpose of making compilation easier. Scope nodes are defined in {prism_compile.h
} and store information such as locals, local table size, local depth offset and the index lookup tables. Scope nodes can be generated for node types which have their own “scope”.
-
Run an optimization pass of compilation
After the instruction sequences are initially computed, there is an existing (non-Prism based) optimization pass of the instruction sequences. There are several optimizations currently inlined into step 2, however, most of them happen in this step. Specifically, any peephole optimizations happen in this step. By the end of step 2, however, the instruction sequences take the same form regardless of if the initial AST was generated by Prism
or not. Therefore, step 3 is agnostic to the parser, and should not require any Prism
specific code.