Compressed State Table¶ ↑
LR parser generates two large tables, action table and GOTO table. Action table is a matrix of states and tokens. Each cell of action table indicates next action (shift, reduce, accept and error). GOTO table is a matrix of states and nonterminal symbols. Each cell of GOTO table indicates next state.
Action table of “parse.y”:
| EOF | LF | NUM | ‘+’ | ‘*’ | ‘(’ | ‘)’ | |
|---|---|---|---|---|---|---|---|
| State 0 | r1 | s1 | s2 | ||||
| State 1 | r3 | r3 | r3 | r3 | r3 | r3 | r3 |
| State 2 | s1 | s2 | |||||
| State 3 | s6 | ||||||
| State 4 | s7 | s8 | s9 | ||||
| State 5 | s8 | s9 | s10 | ||||
| State 6 | acc | acc | acc | acc | acc | acc | acc |
| State 7 | r2 | r2 | r2 | r2 | r2 | r2 | r2 |
| State 8 | s1 | s2 | |||||
| State 9 | s1 | s2 | |||||
| State 10 | r6 | r6 | r6 | r6 | r6 | r6 | r6 |
| State 11 | r4 | r4 | s9 | r4 | |||
| State 12 | r5 | r5 | r5 | r5 |
GOTO table of “parse.y”:
| $accept | program | expr | |
|---|---|---|---|
| State 0 | g3 | g4 | |
| State 1 | |||
| State 2 | g5 | ||
| State 3 | |||
| State 4 | |||
| State 5 | |||
| State 6 | |||
| State 7 | |||
| State 8 | g11 | ||
| State 9 | g12 | ||
| State 10 | |||
| State 11 | |||
| State 12 |
Both action table and GOTO table are sparse. Therefore LR parser generator compresses both tables and creates these tables.
-
yypact&yypgoto -
yytable -
yycheck -
yydefact&yydefgoto
Introduction to major tables¶ ↑
yypact & yypgoto¶ ↑
yypact specifies offset on yytable for the current state. As an optimization, yypact also specifies default reduce action for some states. Accessing the value by state. For example,
offset = yypact[state]
If the value is YYPACT_NINF (Negative INFinity), it means execution of default reduce action. Otherwise the value is an offset in yytable.
yypgoto plays the same role as yypact. But yypgoto is used for GOTO table. Then its index is nonterminal symbol id. Especially yypgoto is used when reduce happens.
rule_for_reduce = rules[rule_id] # lhs_id holds LHS nonterminal id of the rule used for reduce. lhs_id = rule_for_reduce.lhs.id offset = yypgoto[lhs_id] # Validate access to yytable if yycheck[offset + state] == state next_state = yytable[offset + state] end
yytable¶ ↑
yytable is a mixture of action table and GOTO table.
For action table¶ ↑
For action table, yytable specifies what actually to do on the current state.
Positive number means shift and specifies next state. For example, yytable[yyn] == 1 means shift and next state is State 1.
YYTABLE_NINF (Negative INFinity) means syntax error. For example, yytable[yyn] == YYTABLE_NINF means syntax error.
Other negative number and zero mean reducing with the rule whose number is opposite. For example, yytable[yyn] == -1 means reduce with Rule 1.
For GOTO table¶ ↑
For GOTO table, yytable specifies the next state for given LSH nonterminal.
The value is always positive number which means next state id. It never becomes YYTABLE_NINF.
yycheck¶ ↑
yycheck validates accesses to yytable.
Each line of action table and GOTO table is placed into single array in yytable. Consider the case where action table has only two states. In this case, if the second array is shifted to the right, they can be merged into one array without conflict.
[ [ 'a', 'b', , , 'e'], # State 0 [ , 'B', 'C', , 'E'], # State 1 ] # => Shift the second array to the right [ [ 'a', 'b', , , 'e'], # State 0 [ , 'B', 'C', , 'E'], # State 1 ] # => Merge them into single array yytable = [ 'a', 'b', 'B', 'C', 'e', 'E' ]
yypact is an array of each state offset.
yypact = [ 0, # State 0 is not shifted 1 # State 1 is shifted one to right ]
We can access the value of state1[2] by consulting yypact.
yytable[yypact[1] + 2] # => yytable[1 + 2] # => 'C'
However this approach doesn’t work well when accessing to nil value like state1[3]. Because it tries to access to state0[4].
yytable[yypact[1] + 3] # => yytable[1 + 3] # => 'e'
This is why yycheck is needed. yycheck stores valid indexes of the original table. In the current example:
-
0, 1 and 4 are valid index of State 0
-
1, 2 and 4 are valid index of State 1
yycheck stores these indexes with same offset with yytable.
# yytable [ [ 'a', 'b', , , 'e'], # State 0 [ , 'B', 'C', , 'E'], # State 1 ] yytable = [ 'a', 'b', 'B', 'C', 'e', 'E' ] # yycheck [ [ 0, 1, , , 4], # State 0 [ , 1, 2, , 4], # State 1 ] yycheck = [ 0, 1, 1, 2, 4, 4 ]
We can validate accesses to yytable by consulting yycheck. yycheck stores valid indexes in the original arrays then validation is comparing yycheck[index_for_yytable] and index_for_the_state. The access is valid if both values are same.
# Validate an access to state1[2] yycheck[yypact[1] + 2] == 2 # => yycheck[1 + 2] == 2 # => 2 == 2 # => true (valid) # Validate an access to state1[3] yycheck[yypact[1] + 3] == 3 # => yycheck[1 + 3] == 3 # => 4 == 3 # => false (invalid)
yydefact & yydefgoto¶ ↑
yydefact stores rule id of default actions for each state. 0 means syntax error, other number means reduce using Rule N.
rule_id = yydefact[state] # => 0 means syntax error, other number means reduce using Rule whose id is `rule_id`
yydefgoto stores default GOTOs for each nonterminal. The number means next state.
next_state = yydefgoto[lhs_id] # => Next state id is `next_state`
Example¶ ↑
Take a look at compressed tables of “parse.y”. See “parse.output” for detailed information of symbols and states.
yytable¶ ↑
Original action table and GOTO table look like:
# Action table is a matrix of terminals * states [ # [ EOF, error, undef, LF, NUM, '+', '*', '(', ')'] (default reduce) [ , , , , s1, , , s2, ], # State 0 (r1) [ , , , , , , , , ], # State 1 (r3) [ , , , , s1, , , s2, ], # State 2 () [ s6, , , , , , , , ], # State 3 () [ , , , s7, , s8, s9, , ], # State 4 () [ , , , , , s8, s9, , s10], # State 5 () [ , , , , , , , , ], # State 6 (accept) [ , , , , , , , , ], # State 7 (r2) [ , , , , s1, , , s2, ], # State 8 () [ , , , , s1, , , s2, ], # State 9 () [ , , , , , , , , ], # State 10 (r6) [ , , , , , , s9, , ], # State 11 (r4) [ , , , , , , , , ], # State 12 (r5) ] # GOTO table is a matrix of states * nonterminals [ # [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] State No (default goto) [ , , , , , , , , , , , , ], # $accept (g0) [ g3, , , , , , , , , , , , ], # program (g3) [ g4, , g5, , , , , , g11, g12, , , ], # expr (g4) ] # => Remove default goto [ # [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] State No (default goto) [ , , , , , , , , , , , , ], # $accept (g0) [ , , , , , , , , , , , , ], # program (g3) [ , , g5, , , , , , g11, g12, , , ], # expr (g4) ]
These are compressed to yytable like below. If offset equals to YYPACT_NINF, the line has only default value then the line can be ignored (commented out in this example).
[ # Action table # (offset, YYPACT_NINF = -4) [ , , , , s1, , , s2, ], # State 0 ( 6) # [ , , , , , , , , ], # State 1 (-4) [ , , , , s1, , , s2, ], # State 2 ( 6) [ s6, , , , , , , , ], # State 3 ( 1) [ , , , s7, , s8, s9, , ], # State 4 (-1) [ , , , , , s8, s9, , s10], # State 5 ( 3) # [ , , , , , , , , ], # State 6 (-4) # [ , , , , , , , , ], # State 7 (-4) [ , , , , s1, , , s2, ], # State 8 ( 6) [ , , , , s1, , , s2, ], # State 9 ( 6) # [ , , , , , , , , ], # State 10 (-4) [ , , , , , , s9, , ], # State 11 (-3) # [ , , , , , , , , ], # State 12 (-4) # GOTO table # [ , , , , , , , , , , , , ], # $accept (-4) # [ , , , , , , , , , , , , ], # program (-4) [ , , g5, , , , , , g11, g12, , , ], # expr (-2) ] # => compressed into single array [ , , , g5, s6, s7, s9, s8, s9, g11, g12, s8, s9, s1, s10, , s2, ] # => Cut blank cells on head and tail, remove 'g' and 's' prefix, fill blank with 0 # This is `yytable` [ 5, 6, 7, 9, 8, 9, 11, 12, 8, 9, 1, 10, 0, 2]
YYTABLE_NINF is the minimum negative number. In this case, 0 is the minimum offset number then YYTABLE_NINF is -1.
yycheck¶ ↑
[ # Action table valid indexes # (offset, YYPACT_NINF = -4) [ , , , , 4, , , 7, ], # State 0 ( 6) # [ , , , , , , , , ], # State 1 (-4) [ , , , , 4, , , 7, ], # State 2 ( 6) [ 0, , , , , , , , ], # State 3 ( 1) [ , , , 3, , 5, 6, , ], # State 4 (-1) [ , , , , , 5, 6, , 8], # State 5 ( 3) # [ , , , , , , , , ], # State 6 (-4) # [ , , , , , , , , ], # State 7 (-4) [ , , , , 4, , , 7, ], # State 8 ( 6) [ , , , , 4, , , 7, ], # State 9 ( 6) # [ , , , , , , , , ], # State 10 (-4) [ , , , , , , 6, , ], # State 11 (-3) # [ , , , , , , , , ], # State 12 (-4) # GOTO table valid indexes # [ , , , , , , , , , , , , ], # $accept (-4) # [ , , , , , , , , , , , , ], # program (-4) [ , , 2, , , , , , 8, 9, , , ], # expr (-2) ] # => compressed into single array [ , , , 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, , 7, ] # => Cut blank cells on head and tail, fill blank with -1 because no index can be -1 and comparison always fails # This is `yycheck` [ 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, -1, 7]
yypact & yypgoto¶ ↑
yypact & yypgoto are mixture of offset in yytable and YYPACT_NINF (default reduce action). Index in yypact is state id and index in yypgoto is nonterminal symbol id. YYPACT_NINF is the minimum negative number. In this case, -3 is the minimum offset number then YYPACT_NINF is -4.
YYPACT_NINF = -4 yypact = [ # 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 (State No) 6, -4, 6, 1, -1, 3, -4, -4, 6, 6, -4, -3, -4 ] yypgoto = [ # $accept, program, expr -4, -4, -2 ]
yydefact & yydefgoto¶ ↑
yydefact & yydefgoto store default value.
yydefact specifies rule id of default actions of the state. Because 0 is reserved for syntax error, Rule id starts with 1.
# In "parse.output"
Grammar
0 $accept: program "end of file"
1 program: ε
2 | expr LF
3 expr: NUM
4 | expr '+' expr
5 | expr '*' expr
6 | '(' expr ')'
# =>
# In `yydefact`
Grammar
0 Syntax Error
1 $accept: program "end of file"
2 program: ε
3 | expr LF
4 expr: NUM
5 | expr '+' expr
6 | expr '*' expr
7 | '(' expr ')'
For example, default action for state 1 is 4 (yydefact[1] == 4). This means Rule 3 (3 expr: NUM) in “parse.output” file.
yydefgoto specifies next state id of the nonterminal.
yydefact = [ # 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 (State No) 2, 4, 0, 0, 0, 0, 1, 3, 0, 0, 7, 5, 6 ] yydefgoto = [ # $accept, program, expr 0, 3, 4 ]
yyr1 & yyr2¶ ↑
Both of them are tables for rules. yyr1 specifies nonterminal symbol id of rule’s Left-Hand-Side. yyr2 specifies the length of the rule, that is, number of symbols on the rule’s Right-Hand-Side. Index 0 is not used because Rule id starts with 1.
yyr1 = [ # 0, 1, 2, 3, 4, 5, 6, 7 (Rule id) # no rule, $accept, program, program, expr, expr, expr, expr (LHS symbol id) 0, 9, 10, 10, 11, 11, 11, 11 ] yyr2 = [ # 0, 1, 2, 3, 4, 5, 6, 7 (Rule id) 0, 2, 0, 2, 1, 3, 3, 3 ]
How to use tables¶ ↑
See also “parse.rb” which implements LALR parser based on “parse.y” file.
At first, define important constants and arrays:
YYNTOKENS = 9 # The last index of yytable and yycheck # The length of yytable and yycheck are always same YYLAST = 13 YYTABLE_NINF = -1 yytable = [ 5, 6, 7, 9, 8, 9, 11, 12, 8, 9, 1, 10, 0, 2] yycheck = [ 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, -1, 7] YYPACT_NINF = -4 yypact = [ 6, -4, 6, 1, -1, 3, -4, -4, 6, 6, -4, -3, -4] yypgoto = [ -4, -4, -2] yydefact = [ 2, 4, 0, 0, 0, 0, 1, 3, 0, 0, 7, 5, 6] yydefgoto = [ 0, 3, 4] yyr1 = [ 0, 9, 10, 10, 11, 11, 11, 11] yyr2 = [ 0, 2, 0, 2, 1, 3, 3, 3]
Determine what to do next¶ ↑
Determine what to do next based on current state (state) and next token (yytoken).
The first step to decide action is looking up yypact table by current state. If only default reduce exists for the current state, yypact returns YYPACT_NINF.
# Case 1: Only default reduce exists for the state # # State 7 # # 2 program: expr LF • # # $default reduce using rule 2 (program) state = 7 yytoken = nil # Do not use yytoken in this case offset = yypact[state] # -4 if offset == YYPACT_NINF # true next_action = :yydefault return end
If both shift and default reduce exists for the current state, yypact returns offset in yytable. Index is the sum of offset and yytoken. Need to check index before access to yytable by consulting yycheck. Index can be out of range because blank cells on head and tail are omitted, see how yycheck is constructed in the example above. Therefore need to check an index is not less than 0 and not greater than YYLAST.
# Case 2: Both shift and default reduce exists for the state # # State 11 # # 4 expr: expr • '+' expr # 4 | expr '+' expr • [LF, '+', ')'] # 5 | expr • '*' expr # # '*' shift, and go to state 9 # # $default reduce using rule 4 (expr) # Next token is '*' then shift it state = 11 yytoken = nil offset = yypact[state] # -3 if offset == YYPACT_NINF # false next_action = :yydefault break end unless yytoken yytoken = yylex() # yylex returns 6 ('*') end idx = offset + yytoken # 3 if idx < 0 || YYLAST < idx # false next_action = :yydefault break end if yycheck[idx] != yytoken # false next_action = :yydefault break end act = yytable[idx] # 9 if act == YYTABLE_NINF # false next_action = :syntax_error break end if act > 0 # true # Shift next_action = :yyshift break else # Reduce next_action = :yyreduce break end
Execute (default) reduce¶ ↑
Once next action is decided to default reduce, need to determine
-
the rule to be applied
-
the next state from GOTO table
Rule id for the default reduce is stored in yydefact. 0 in yydefact means syntax error so need to check the value is not 0 before continue the process.
Once rule is determined, the length of the rule can be decided from yyr2 and the LHS nonterminal can be decided from yyr1.
The next state is determined by LHS nonterminal and the state after reduce. GOTO table is also compressed into yytable then the process to decide next state is similar to yypact.
-
Look up
yypgotoby LHS nonterminal. Noteyypactis indexed by state butyypgotois indexed by nonterminal. -
Check the value on
yypgotoisYYPACT_NINFis not. -
Check the index, sum of offset and state, is out of range or not.
-
Check
yychecktable before access toyytable.
Finally push the state to the stack.
# State 11 # # 4 expr: expr • '+' expr # 4 | expr '+' expr • [LF, '+', ')'] # 5 | expr • '*' expr # # '*' shift, and go to state 9 # # $default reduce using rule 4 (expr) # Input is "1 + 2 + 3 LF" and next token is the second '+'. # Current state stack is `[0, 4, 8, 11]`. # What to do next is reduce with default action. state = 11 yytoken = 5 # '+' rule = yydefact[state] # 5 if rule == 0 # false next_action = :syntax_error break end rhs_length = yyr2[rule] # 3. Because rule 4 is "expr: expr '+' expr" lhs_nterm = yyr1[rule] # 11 (expr) lhs_nterm_id = lhs_nterm - YYNTOKENS # 11 - 9 = 2 case rule when 1 # Execute Rule 1 action when 2 # Execute Rule 2 action #... when 7 # Execute Rule 7 action end stack.pop(rhs_length) # state stack: `[0, 4, 8, 11]` -> `[0]` state = stack[-1] # state = 0 offset = yypgoto[lhs_nterm_id] # -2 if offset == YYPACT_NINF # false state = yydefgoto[lhs_nterm_id] else idx = offset + state # 0 if idx < 0 || YYLAST < idx # true state = yydefgoto[lhs_nterm_id] # 4 elsif yycheck[idx] != state state = yydefgoto[lhs_nterm_id] else state = yytable[idx] end end # yyval = $$, yyloc = @$ push_state(state, yyval, yyloc) # state stack: [0, 4]