Hello,
I'm working on a potential Hyperscan use case that requires compiling new databases
during program execution.
Right now, I'm compiling a dataset of about 5400 plain strings (domain names with
periods escaped, no special regex characters) with hs_compile_multi() in block mode and
with flags HS_FLAG_CASELESS and HS_FLAG_SINGLEMATCH.
I graphed compilation time with 10 items, 100, 500, 1000, and then every 1000 items up to
5000. It looks like compilation time increases exponentially, and the final use case could
require a dataset of 10k+ strings. Any suggestions for optimizations I can try to speed up
compilation?
Thanks in advance!
Show replies by date
Hi,
I have to say there isn’t any shortcut to reduce compile time significantly especially if
you care about performance.
We are doing complex analysis passes at compile time in Hyperscan to generate optimized
database that delivers the best performance as we can.
Based on our tests on pattern-sets (may be different from yours) ranging from 100 to 100k
fixed strings, we don’t see exponential compile time increase. The compile time of
Hyperscan for the majority of pattern-sets should be under 1s.
It’ll be good if you can provide more details about your tests, such as pattern-sets, test
approach, etc.
BTW, are you using hsbench for your tests? If not, I would suggest you to use it to get
different Hyperscan metrics, like compile time and performance.
Thanks,
Xiang
From: Hyperscan [mailto:hyperscan-bounces@lists.01.org] On Behalf Of Yvonne Chen
Sent: Saturday, April 28, 2018 9:13 AM
To: hyperscan(a)lists.01.org
Subject: [Hyperscan] Speeding up Hyperscan compilation time
Hello,
I'm working on a potential Hyperscan use case that requires compiling new databases
during program execution.
Right now, I'm compiling a dataset of about 5400 plain strings (domain names with
periods escaped, no special regex characters) with hs_compile_multi() in block mode and
with flags HS_FLAG_CASELESS and HS_FLAG_SINGLEMATCH.
I graphed compilation time with 10 items, 100, 500, 1000, and then every 1000 items up to
5000. It looks like compilation time increases exponentially, and the final use case could
require a dataset of 10k+ strings. Any suggestions for optimizations I can try to speed up
compilation?
Thanks in advance!