I'm working on a potential Hyperscan use case that requires compiling new databases during program execution.
Right now, I'm compiling a dataset of about 5400 plain strings (domain names with periods escaped, no special regex characters) with hs_compile_multi() in block mode and with flags HS_FLAG_CASELESS and HS_FLAG_SINGLEMATCH.
I graphed compilation time with 10 items, 100, 500, 1000, and then every 1000 items up to 5000. It looks like compilation time increases exponentially, and the final use case could require a dataset of 10k+ strings. Any suggestions for optimizations I can try to speed up compilation?
Thanks in advance!
Could anyone clarify this part of the documentation:
Multiple pattern matching: Hyperscan allows matches to be reported
for several patterns simultaneously. This is not equivalent to
separating the patterns by | in libpcre, which evaluates
Does it mean that if a text matches several patterns, the order of
patterns reported to be matching will be completely arbitrary
(i.e. depending on the whole pattern set), or there will be some
predictable yet not left-to-right order, like from the least specific
pattern to more specific ones? If it helps, I'm interested in the first
match only. Flags are: HS_FLAG_DOTALL | HS_FLAG_ALLOWEMPTY |
patterns: "12", "1", "13"
matches: "1", "12" (least specific to most specific)