Replies: 5 comments
-
The PEGTL does not optimise this case, and to be honest it probably never will, it doesn't fit well with our more minimalist approach. For the example we would probably pull the In that case I would probably write a custom rule, might be as easy as finding the first non-ASCII-letter in the input and then performing a lookup in a To make it even faster one could try fiddling with trie data structures or similar, but so far we have always found the performance of simpler approaches more than sufficient for our requirements... |
Beta Was this translation helpful? Give feedback.
-
Another option: The rule in the grammar would match any sequence of characters ( |
Beta Was this translation helpful? Give feedback.
-
You could look in to mixing in a tool like Ragel, which will perform this optimization, and then call the resulting generated code from a custom PEGTL rule. Ragel parsers match regular grammars and do not backtrack. Matching will be something like O(length of your longest string) Another option, as the guys above recommended, is to scan ahead to a delimiter and then test the string against a set. If you have 300 strings in your set, and you know what they are all going to be at compile time, you could take a look at perfect hashing. Applying a perfect hash will be O(1) regardless of how large your set is. Besides the classic gperf, there are easy to use implementations out there that can help you with this approach. Neither of these approaches will impose any additional runtime dependencies. |
Beta Was this translation helpful? Give feedback.
-
An article just hit the top of Hacker News today from PostgreSQL: they are now using perfect hashing in their SQL parser to detect (a set of 400) keywords: https://www.postgresql.org/message-id/flat/E1ghOVt-0007os-2V%40gemulon.postgresql.org |
Beta Was this translation helpful? Give feedback.
-
It looks like the original question was answered, and additional pointers given (thanks @nlyan) on how to work around the issue, so I'll close this for now. Feel free to re-open and/or continue the discussion if necessary! |
Beta Was this translation helpful? Give feedback.
-
Question: Suppose we have a definition like this
Will the library check through each
istring<>
and check for a match, or does it notice that 3 of them share an 'a'/'A' and create a suffix-tree like matcher so it only looks at the first letter once for matches?This is more of a curiosity question, however I am implementing something that will have unfortunately 300+
istring<>
's in something that performance is important (but not critical). I haven't profiled anything, I'm just exploring ideas and seeing what my bounds are.Beta Was this translation helpful? Give feedback.
All reactions