Prism Ruby parser
Functions
pm_strpbrk.h File Reference

A custom strpbrk implementation. More...

#include "prism/defines.h"
#include "prism/diagnostic.h"
#include "prism/parser.h"
#include <stddef.h>
#include <string.h>
Include dependency graph for pm_strpbrk.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Functions

const uint8_t * pm_strpbrk (pm_parser_t *parser, const uint8_t *source, const uint8_t *charset, ptrdiff_t length, bool validate)
 Here we have rolled our own version of strpbrk. More...
 

Detailed Description

A custom strpbrk implementation.

Function Documentation

◆ pm_strpbrk()

const uint8_t* pm_strpbrk ( pm_parser_t parser,
const uint8_t *  source,
const uint8_t *  charset,
ptrdiff_t  length,
bool  validate 
)

Here we have rolled our own version of strpbrk.

The standard library strpbrk has undefined behavior when the source string is not null-terminated. We want to support strings that are not null-terminated because pm_parse does not have the contract that the string is null-terminated. (This is desirable because it means the extension can call pm_parse with the result of a call to mmap).

The standard library strpbrk also does not support passing a maximum length to search. We want to support this for the reason mentioned above, but we also don't want it to stop on null bytes. Ruby actually allows null bytes within strings, comments, regular expressions, etc. So we need to be able to skip past them.

Finally, we want to support encodings wherein the charset could contain characters that are trailing bytes of multi-byte characters. For example, in Shift-JIS, the backslash character can be a trailing byte. In that case we need to take a slower path and iterate one multi-byte character at a time.

Parameters
parserThe parser.
sourceThe source to search.
charsetThe charset to search for.
lengthThe maximum number of bytes to search.
validateWhether to validate that the source string is valid in the current encoding of the parser.
Returns
A pointer to the first character in the source string that is in the charset, or NULL if no such character exists.

The standard library strpbrk has undefined behavior when the source string is not null-terminated. We want to support strings that are not null-terminated because pm_parse does not have the contract that the string is null-terminated. (This is desirable because it means the extension can call pm_parse with the result of a call to mmap).

The standard library strpbrk also does not support passing a maximum length to search. We want to support this for the reason mentioned above, but we also don't want it to stop on null bytes. Ruby actually allows null bytes within strings, comments, regular expressions, etc. So we need to be able to skip past them.

Finally, we want to support encodings wherein the charset could contain characters that are trailing bytes of multi-byte characters. For example, in Shift_JIS, the backslash character can be a trailing byte. In that case we need to take a slower path and iterate one multi-byte character at a time.