When lexing Ruby source, the lexer has a small amount of state to tell which kind of token it is currently lexing. More...

#include <parser.h>

Collaboration diagram for pm_lex_mode:

[legend]

Public Types
enum	{ PM_LEX_DEFAULT , PM_LEX_EMBEXPR , PM_LEX_EMBVAR , PM_LEX_HEREDOC , PM_LEX_LIST , PM_LEX_REGEXP , PM_LEX_STRING }
	The type of this lex mode. More...

Data Fields
enum pm_lex_mode:: { ... }	mode
	The type of this lex mode.

union {

struct {

size_t nesting
	This keeps track of the nesting level of the list.

bool interpolation
	Whether or not interpolation is allowed in this list.

uint8_t incrementor
	When lexing a list, it takes into account balancing the terminator if the terminator is one of (), [], {}, or <>.

uint8_t terminator
	This is the terminator of the list literal.

uint8_t breakpoints [11]
	This is the character set that should be used to delimit the tokens within the list.

} list

struct {

size_t nesting
	This keeps track of the nesting level of the regular expression.

uint8_t incrementor
	When lexing a regular expression, it takes into account balancing the terminator if the terminator is one of (), [], {}, or <>.

uint8_t terminator
	This is the terminator of the regular expression.

uint8_t breakpoints [7]
	This is the character set that should be used to delimit the tokens within the regular expression.

} regexp

struct {

size_t nesting
	This keeps track of the nesting level of the string.

bool interpolation
	Whether or not interpolation is allowed in this string.

bool label_allowed
	Whether or not at the end of the string we should allow a :, which would indicate this was a dynamic symbol instead of a string.

uint8_t incrementor
	When lexing a string, it takes into account balancing the terminator if the terminator is one of (), [], {}, or <>.

uint8_t terminator
	This is the terminator of the string. More...

uint8_t breakpoints [7]
	This is the character set that should be used to delimit the tokens within the string.

} string

struct {

pm_heredoc_lex_mode_t base
	All of the data necessary to lex a heredoc.

const uint8_t * next_start
	This is the pointer to the character where lexing should resume once the heredoc has been completely processed.

size_t * common_whitespace
	This is used to track the amount of common whitespace on each line so that we know how much to dedent each line in the case of a tilde heredoc.

bool line_continuation
	True if the previous token ended with a line continuation.

} heredoc

}	as
	The data associated with this type of lex mode.

struct pm_lex_mode *	prev
	The previous lex state so that it knows how to pop.

Detailed Description

When lexing Ruby source, the lexer has a small amount of state to tell which kind of token it is currently lexing.

For example, when we find the start of a string, the first token that we return is a TOKEN_STRING_BEGIN token. After that the lexer is now in the PM_LEX_STRING mode, and will return tokens that are found as part of a string.

Member Enumeration Documentation

◆ anonymous enum

anonymous enum

The type of this lex mode.

Enumerator
PM_LEX_DEFAULT	This state is used when any given token is being lexed.
PM_LEX_EMBEXPR	This state is used when we're lexing as normal but inside an embedded expression of a string.
PM_LEX_EMBVAR	This state is used when we're lexing a variable that is embedded directly inside of a string with the # shorthand.
PM_LEX_HEREDOC	This state is used when you are inside the content of a heredoc.
PM_LEX_LIST	This state is used when we are lexing a list of tokens, as in a w word list literal or a i symbol list literal.
PM_LEX_REGEXP	This state is used when a regular expression has been begun and we are looking for the terminator.
PM_LEX_STRING	This state is used when we are lexing a string or a string-like token, as in string content with either quote or an xstring.

Field Documentation

◆ nesting

size_t pm_lex_mode::nesting

This keeps track of the nesting level of the list.

This keeps track of the nesting level of the string.

This keeps track of the nesting level of the regular expression.

◆ interpolation

bool pm_lex_mode::interpolation

Whether or not interpolation is allowed in this list.

Whether or not interpolation is allowed in this string.

◆ incrementor

uint8_t pm_lex_mode::incrementor

When lexing a list, it takes into account balancing the terminator if the terminator is one of (), [], {}, or <>.

When lexing a string, it takes into account balancing the terminator if the terminator is one of (), [], {}, or <>.

When lexing a regular expression, it takes into account balancing the terminator if the terminator is one of (), [], {}, or <>.

◆ terminator

uint8_t pm_lex_mode::terminator

This is the terminator of the list literal.

This is the terminator of the string.

This is the terminator of the regular expression.

It is typically either a single or double quote.

◆ breakpoints

uint8_t pm_lex_mode::breakpoints[7]

This is the character set that should be used to delimit the tokens within the list.

This is the character set that should be used to delimit the tokens within the string.

This is the character set that should be used to delimit the tokens within the regular expression.

The documentation for this struct was generated from the following file:

include/prism/parser.h

Public Types

Data Fields