Compressed State Table¶ ↑
LR parser generates two large tables, action table and GOTO table. Action table is a matrix of states and tokens. Each cell of action table indicates next action (shift, reduce, accept and error). GOTO table is a matrix of states and nonterminal symbols. Each cell of GOTO table indicates next state.
Action table of “parse.y”:
EOF | LF | NUM | ‘+’ | ‘*’ | ‘(’ | ‘)’ | |
---|---|---|---|---|---|---|---|
State 0 | r1 | s1 | s2 | ||||
State 1 | r3 | r3 | r3 | r3 | r3 | r3 | r3 |
State 2 | s1 | s2 | |||||
State 3 | s6 | ||||||
State 4 | s7 | s8 | s9 | ||||
State 5 | s8 | s9 | s10 | ||||
State 6 | acc | acc | acc | acc | acc | acc | acc |
State 7 | r2 | r2 | r2 | r2 | r2 | r2 | r2 |
State 8 | s1 | s2 | |||||
State 9 | s1 | s2 | |||||
State 10 | r6 | r6 | r6 | r6 | r6 | r6 | r6 |
State 11 | r4 | r4 | s9 | r4 | |||
State 12 | r5 | r5 | r5 | r5 |
GOTO table of “parse.y”:
$accept | program | expr | |
---|---|---|---|
State 0 | g3 | g4 | |
State 1 | |||
State 2 | g5 | ||
State 3 | |||
State 4 | |||
State 5 | |||
State 6 | |||
State 7 | |||
State 8 | g11 | ||
State 9 | g12 | ||
State 10 | |||
State 11 | |||
State 12 |
Both action table and GOTO table are sparse. Therefore LR parser generator compresses both tables and creates these tables.
-
yypact
&yypgoto
-
yytable
-
yycheck
-
yydefact
&yydefgoto
Introduction to major tables¶ ↑
yypact
& yypgoto
¶ ↑
yypact
specifies offset on yytable
for the current state. As an optimization, yypact
also specifies default reduce action for some states. Accessing the value by state
. For example,
offset = yypact[state]
If the value is YYPACT_NINF
(Negative INFinity), it means execution of default reduce action. Otherwise the value is an offset in yytable
.
yypgoto
plays the same role as yypact
. But yypgoto
is used for GOTO table. Then its index is nonterminal symbol id. Especially yypgoto
is used when reduce happens.
rule_for_reduce = rules[rule_id] # lhs_id holds LHS nonterminal id of the rule used for reduce. lhs_id = rule_for_reduce.lhs.id offset = yypgoto[lhs_id] # Validate access to yytable if yycheck[offset + state] == state next_state = yytable[offset + state] end
yytable
¶ ↑
yytable
is a mixture of action table and GOTO table.
For action table¶ ↑
For action table, yytable
specifies what actually to do on the current state.
Positive number means shift and specifies next state. For example, yytable[yyn] == 1
means shift and next state is State 1.
YYTABLE_NINF
(Negative INFinity) means syntax error. For example, yytable[yyn] == YYTABLE_NINF
means syntax error.
Other negative number and zero mean reducing with the rule whose number is opposite. For example, yytable[yyn] == -1
means reduce with Rule 1.
For GOTO table¶ ↑
For GOTO table, yytable
specifies the next state for given LSH nonterminal.
The value is always positive number which means next state id. It never becomes YYTABLE_NINF
.
yycheck
¶ ↑
yycheck
validates accesses to yytable
.
Each line of action table and GOTO table is placed into single array in yytable
. Consider the case where action table has only two states. In this case, if the second array is shifted to the right, they can be merged into one array without conflict.
[ [ 'a', 'b', , , 'e'], # State 0 [ , 'B', 'C', , 'E'], # State 1 ] # => Shift the second array to the right [ [ 'a', 'b', , , 'e'], # State 0 [ , 'B', 'C', , 'E'], # State 1 ] # => Merge them into single array yytable = [ 'a', 'b', 'B', 'C', 'e', 'E' ]
yypact
is an array of each state offset.
yypact = [ 0, # State 0 is not shifted 1 # State 1 is shifted one to right ]
We can access the value of state1[2]
by consulting yypact
.
yytable[yypact[1] + 2] # => yytable[1 + 2] # => 'C'
However this approach doesn’t work well when accessing to nil value like state1[3]
. Because it tries to access to state0[4]
.
yytable[yypact[1] + 3] # => yytable[1 + 3] # => 'e'
This is why yycheck
is needed. yycheck
stores valid indexes of the original table. In the current example:
-
0, 1 and 4 are valid index of State 0
-
1, 2 and 4 are valid index of State 1
yycheck
stores these indexes with same offset with yytable
.
# yytable [ [ 'a', 'b', , , 'e'], # State 0 [ , 'B', 'C', , 'E'], # State 1 ] yytable = [ 'a', 'b', 'B', 'C', 'e', 'E' ] # yycheck [ [ 0, 1, , , 4], # State 0 [ , 1, 2, , 4], # State 1 ] yycheck = [ 0, 1, 1, 2, 4, 4 ]
We can validate accesses to yytable
by consulting yycheck
. yycheck
stores valid indexes in the original arrays then validation is comparing yycheck[index_for_yytable]
and index_for_the_state
. The access is valid if both values are same.
# Validate an access to state1[2] yycheck[yypact[1] + 2] == 2 # => yycheck[1 + 2] == 2 # => 2 == 2 # => true (valid) # Validate an access to state1[3] yycheck[yypact[1] + 3] == 3 # => yycheck[1 + 3] == 3 # => 4 == 3 # => false (invalid)
yydefact
& yydefgoto
¶ ↑
yydefact
stores rule id of default actions for each state. 0
means syntax error, other number means reduce using Rule N.
rule_id = yydefact[state] # => 0 means syntax error, other number means reduce using Rule whose id is `rule_id`
yydefgoto
stores default GOTOs for each nonterminal. The number means next state.
next_state = yydefgoto[lhs_id] # => Next state id is `next_state`
Example¶ ↑
Take a look at compressed tables of “parse.y”. See “parse.output” for detailed information of symbols and states.
yytable
¶ ↑
Original action table and GOTO table look like:
# Action table is a matrix of terminals * states [ # [ EOF, error, undef, LF, NUM, '+', '*', '(', ')'] (default reduce) [ , , , , s1, , , s2, ], # State 0 (r1) [ , , , , , , , , ], # State 1 (r3) [ , , , , s1, , , s2, ], # State 2 () [ s6, , , , , , , , ], # State 3 () [ , , , s7, , s8, s9, , ], # State 4 () [ , , , , , s8, s9, , s10], # State 5 () [ , , , , , , , , ], # State 6 (accept) [ , , , , , , , , ], # State 7 (r2) [ , , , , s1, , , s2, ], # State 8 () [ , , , , s1, , , s2, ], # State 9 () [ , , , , , , , , ], # State 10 (r6) [ , , , , , , s9, , ], # State 11 (r4) [ , , , , , , , , ], # State 12 (r5) ] # GOTO table is a matrix of states * nonterminals [ # [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] State No (default goto) [ , , , , , , , , , , , , ], # $accept (g0) [ g3, , , , , , , , , , , , ], # program (g3) [ g4, , g5, , , , , , g11, g12, , , ], # expr (g4) ] # => Remove default goto [ # [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] State No (default goto) [ , , , , , , , , , , , , ], # $accept (g0) [ , , , , , , , , , , , , ], # program (g3) [ , , g5, , , , , , g11, g12, , , ], # expr (g4) ]
These are compressed to yytable
like below. If offset equals to YYPACT_NINF
, the line has only default value then the line can be ignored (commented out in this example).
[ # Action table # (offset, YYPACT_NINF = -4) [ , , , , s1, , , s2, ], # State 0 ( 6) # [ , , , , , , , , ], # State 1 (-4) [ , , , , s1, , , s2, ], # State 2 ( 6) [ s6, , , , , , , , ], # State 3 ( 1) [ , , , s7, , s8, s9, , ], # State 4 (-1) [ , , , , , s8, s9, , s10], # State 5 ( 3) # [ , , , , , , , , ], # State 6 (-4) # [ , , , , , , , , ], # State 7 (-4) [ , , , , s1, , , s2, ], # State 8 ( 6) [ , , , , s1, , , s2, ], # State 9 ( 6) # [ , , , , , , , , ], # State 10 (-4) [ , , , , , , s9, , ], # State 11 (-3) # [ , , , , , , , , ], # State 12 (-4) # GOTO table # [ , , , , , , , , , , , , ], # $accept (-4) # [ , , , , , , , , , , , , ], # program (-4) [ , , g5, , , , , , g11, g12, , , ], # expr (-2) ] # => compressed into single array [ , , , g5, s6, s7, s9, s8, s9, g11, g12, s8, s9, s1, s10, , s2, ] # => Cut blank cells on head and tail, remove 'g' and 's' prefix, fill blank with 0 # This is `yytable` [ 5, 6, 7, 9, 8, 9, 11, 12, 8, 9, 1, 10, 0, 2]
YYTABLE_NINF
is the minimum negative number. In this case, 0
is the minimum offset number then YYTABLE_NINF
is -1
.
yycheck
¶ ↑
[ # Action table valid indexes # (offset, YYPACT_NINF = -4) [ , , , , 4, , , 7, ], # State 0 ( 6) # [ , , , , , , , , ], # State 1 (-4) [ , , , , 4, , , 7, ], # State 2 ( 6) [ 0, , , , , , , , ], # State 3 ( 1) [ , , , 3, , 5, 6, , ], # State 4 (-1) [ , , , , , 5, 6, , 8], # State 5 ( 3) # [ , , , , , , , , ], # State 6 (-4) # [ , , , , , , , , ], # State 7 (-4) [ , , , , 4, , , 7, ], # State 8 ( 6) [ , , , , 4, , , 7, ], # State 9 ( 6) # [ , , , , , , , , ], # State 10 (-4) [ , , , , , , 6, , ], # State 11 (-3) # [ , , , , , , , , ], # State 12 (-4) # GOTO table valid indexes # [ , , , , , , , , , , , , ], # $accept (-4) # [ , , , , , , , , , , , , ], # program (-4) [ , , 2, , , , , , 8, 9, , , ], # expr (-2) ] # => compressed into single array [ , , , 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, , 7, ] # => Cut blank cells on head and tail, fill blank with -1 because no index can be -1 and comparison always fails # This is `yycheck` [ 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, -1, 7]
yypact
& yypgoto
¶ ↑
yypact
& yypgoto
are mixture of offset in yytable
and YYPACT_NINF
(default reduce action). Index in yypact
is state id and index in yypgoto
is nonterminal symbol id. YYPACT_NINF
is the minimum negative number. In this case, -3
is the minimum offset number then YYPACT_NINF
is -4
.
YYPACT_NINF = -4 yypact = [ # 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 (State No) 6, -4, 6, 1, -1, 3, -4, -4, 6, 6, -4, -3, -4 ] yypgoto = [ # $accept, program, expr -4, -4, -2 ]
yydefact
& yydefgoto
¶ ↑
yydefact
& yydefgoto
store default value.
yydefact
specifies rule id of default actions of the state. Because 0
is reserved for syntax error, Rule id starts with 1.
# In "parse.output" Grammar 0 $accept: program "end of file" 1 program: ε 2 | expr LF 3 expr: NUM 4 | expr '+' expr 5 | expr '*' expr 6 | '(' expr ')' # => # In `yydefact` Grammar 0 Syntax Error 1 $accept: program "end of file" 2 program: ε 3 | expr LF 4 expr: NUM 5 | expr '+' expr 6 | expr '*' expr 7 | '(' expr ')'
For example, default action for state 1 is 4 (yydefact[1] == 4
). This means Rule 3 (3 expr: NUM
) in “parse.output” file.
yydefgoto
specifies next state id of the nonterminal.
yydefact = [ # 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 (State No) 2, 4, 0, 0, 0, 0, 1, 3, 0, 0, 7, 5, 6 ] yydefgoto = [ # $accept, program, expr 0, 3, 4 ]
yyr1
& yyr2
¶ ↑
Both of them are tables for rules. yyr1
specifies nonterminal symbol id of rule’s Left-Hand-Side. yyr2
specifies the length of the rule, that is, number of symbols on the rule’s Right-Hand-Side. Index 0 is not used because Rule id starts with 1.
yyr1 = [ # 0, 1, 2, 3, 4, 5, 6, 7 (Rule id) # no rule, $accept, program, program, expr, expr, expr, expr (LHS symbol id) 0, 9, 10, 10, 11, 11, 11, 11 ] yyr2 = [ # 0, 1, 2, 3, 4, 5, 6, 7 (Rule id) 0, 2, 0, 2, 1, 3, 3, 3 ]
How to use tables¶ ↑
See also “parse.rb” which implements LALR parser based on “parse.y” file.
At first, define important constants and arrays:
YYNTOKENS = 9 # The last index of yytable and yycheck # The length of yytable and yycheck are always same YYLAST = 13 YYTABLE_NINF = -1 yytable = [ 5, 6, 7, 9, 8, 9, 11, 12, 8, 9, 1, 10, 0, 2] yycheck = [ 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, -1, 7] YYPACT_NINF = -4 yypact = [ 6, -4, 6, 1, -1, 3, -4, -4, 6, 6, -4, -3, -4] yypgoto = [ -4, -4, -2] yydefact = [ 2, 4, 0, 0, 0, 0, 1, 3, 0, 0, 7, 5, 6] yydefgoto = [ 0, 3, 4] yyr1 = [ 0, 9, 10, 10, 11, 11, 11, 11] yyr2 = [ 0, 2, 0, 2, 1, 3, 3, 3]
Determine what to do next¶ ↑
Determine what to do next based on current state (state
) and next token (yytoken
).
The first step to decide action is looking up yypact
table by current state. If only default reduce exists for the current state, yypact
returns YYPACT_NINF
.
# Case 1: Only default reduce exists for the state # # State 7 # # 2 program: expr LF • # # $default reduce using rule 2 (program) state = 7 yytoken = nil # Do not use yytoken in this case offset = yypact[state] # -4 if offset == YYPACT_NINF # true next_action = :yydefault return end
If both shift and default reduce exists for the current state, yypact
returns offset in yytable
. Index is the sum of offset
and yytoken
. Need to check index before access to yytable
by consulting yycheck
. Index can be out of range because blank cells on head and tail are omitted, see how yycheck
is constructed in the example above. Therefore need to check an index is not less than 0 and not greater than YYLAST
.
# Case 2: Both shift and default reduce exists for the state # # State 11 # # 4 expr: expr • '+' expr # 4 | expr '+' expr • [LF, '+', ')'] # 5 | expr • '*' expr # # '*' shift, and go to state 9 # # $default reduce using rule 4 (expr) # Next token is '*' then shift it state = 11 yytoken = nil offset = yypact[state] # -3 if offset == YYPACT_NINF # false next_action = :yydefault break end unless yytoken yytoken = yylex() # yylex returns 6 ('*') end idx = offset + yytoken # 3 if idx < 0 || YYLAST < idx # false next_action = :yydefault break end if yycheck[idx] != yytoken # false next_action = :yydefault break end act = yytable[idx] # 9 if act == YYTABLE_NINF # false next_action = :syntax_error break end if act > 0 # true # Shift next_action = :yyshift break else # Reduce next_action = :yyreduce break end
Execute (default) reduce¶ ↑
Once next action is decided to default reduce, need to determine
-
the rule to be applied
-
the next state from GOTO table
Rule id for the default reduce is stored in yydefact
. 0
in yydefact
means syntax error so need to check the value is not 0
before continue the process.
Once rule is determined, the length of the rule can be decided from yyr2
and the LHS nonterminal can be decided from yyr1
.
The next state is determined by LHS nonterminal and the state after reduce. GOTO table is also compressed into yytable
then the process to decide next state is similar to yypact
.
-
Look up
yypgoto
by LHS nonterminal. Noteyypact
is indexed by state butyypgoto
is indexed by nonterminal. -
Check the value on
yypgoto
isYYPACT_NINF
is not. -
Check the index, sum of offset and state, is out of range or not.
-
Check
yycheck
table before access toyytable
.
Finally push the state to the stack.
# State 11 # # 4 expr: expr • '+' expr # 4 | expr '+' expr • [LF, '+', ')'] # 5 | expr • '*' expr # # '*' shift, and go to state 9 # # $default reduce using rule 4 (expr) # Input is "1 + 2 + 3 LF" and next token is the second '+'. # Current state stack is `[0, 4, 8, 11]`. # What to do next is reduce with default action. state = 11 yytoken = 5 # '+' rule = yydefact[state] # 5 if rule == 0 # false next_action = :syntax_error break end rhs_length = yyr2[rule] # 3. Because rule 4 is "expr: expr '+' expr" lhs_nterm = yyr1[rule] # 11 (expr) lhs_nterm_id = lhs_nterm - YYNTOKENS # 11 - 9 = 2 case rule when 1 # Execute Rule 1 action when 2 # Execute Rule 2 action #... when 7 # Execute Rule 7 action end stack.pop(rhs_length) # state stack: `[0, 4, 8, 11]` -> `[0]` state = stack[-1] # state = 0 offset = yypgoto[lhs_nterm_id] # -2 if offset == YYPACT_NINF # false state = yydefgoto[lhs_nterm_id] else idx = offset + state # 0 if idx < 0 || YYLAST < idx # true state = yydefgoto[lhs_nterm_id] # 4 elsif yycheck[idx] != state state = yydefgoto[lhs_nterm_id] else state = yytable[idx] end end # yyval = $$, yyloc = @$ push_state(state, yyval, yyloc) # state stack: [0, 4]