Prism Ruby parser
|
A custom strpbrk implementation. More...
#include "prism/defines.h"
#include "prism/diagnostic.h"
#include "prism/parser.h"
#include <stddef.h>
#include <string.h>
Go to the source code of this file.
Functions | |
const uint8_t * | pm_strpbrk (pm_parser_t *parser, const uint8_t *source, const uint8_t *charset, ptrdiff_t length, bool validate) |
Here we have rolled our own version of strpbrk. | |
A custom strpbrk implementation.
const uint8_t * pm_strpbrk | ( | pm_parser_t * | parser, |
const uint8_t * | source, | ||
const uint8_t * | charset, | ||
ptrdiff_t | length, | ||
bool | validate | ||
) |
Here we have rolled our own version of strpbrk.
The standard library strpbrk has undefined behavior when the source string is not null-terminated. We want to support strings that are not null-terminated because pm_parse does not have the contract that the string is null-terminated. (This is desirable because it means the extension can call pm_parse with the result of a call to mmap).
The standard library strpbrk also does not support passing a maximum length to search. We want to support this for the reason mentioned above, but we also don't want it to stop on null bytes. Ruby actually allows null bytes within strings, comments, regular expressions, etc. So we need to be able to skip past them.
Finally, we want to support encodings wherein the charset could contain characters that are trailing bytes of multi-byte characters. For example, in Shift-JIS, the backslash character can be a trailing byte. In that case we need to take a slower path and iterate one multi-byte character at a time.
parser | The parser. |
source | The source to search. |
charset | The charset to search for. |
length | The maximum number of bytes to search. |
validate | Whether to validate that the source string is valid in the current encoding of the parser. |
The standard library strpbrk has undefined behavior when the source string is not null-terminated. We want to support strings that are not null-terminated because pm_parse does not have the contract that the string is null-terminated. (This is desirable because it means the extension can call pm_parse with the result of a call to mmap).
The standard library strpbrk also does not support passing a maximum length to search. We want to support this for the reason mentioned above, but we also don't want it to stop on null bytes. Ruby actually allows null bytes within strings, comments, regular expressions, etc. So we need to be able to skip past them.
Finally, we want to support encodings wherein the charset could contain characters that are trailing bytes of multi-byte characters. For example, in Shift_JIS, the backslash character can be a trailing byte. In that case we need to take a slower path and iterate one multi-byte character at a time.