Assuming N is the input length and M is the total 'pattern length' (some combination of pattern size and count).

I don't think there's a clean expression of this for compile; informally, we tried to avoid M^2 behavior at compile time but I imagine there are still corner cases for some pattern constructs that might blow up in that fashion (hopefully not worse). 

I'm pretty such that runtime is O(NM) - with some pretty 'hairy' variations in constant and additive factors depending on pattern type and input constants. Very early versions of Hyperscan (the paleolithic Hyperscan 1.0 - "you had to be there, but be glad you weren't") were potentially O(N^2*M) due to the fact that they scanned outwards from each literal 'factor' match. I think we worked pretty hard to not replicate that.

I am not currently working on Hyperscan, but I doubt that this behavior has changed from the 4.x to the 5.x release series.

Regards,
Geoff.



On Fri, Feb 1, 2019 at 12:23 PM Bobby Martin <bobbymartin2@gmail.com> wrote:
Thank you!

Next question: what are the bounds on time and space complexity of hs_compile and hs_scan in the length of the regular expression pattern?

On Wed, Nov 7, 2018 at 3:32 PM Geoff Langdale <geoff.langdale@gmail.com> wrote:
Hyperscan doesn't go exponential in the size of the searched string at any time. It also doesn't handle things that require backtracking (e.g. backreferences).

This question arises from a common confusion. Irritatingly, various people (Friedl springs to mind) have willfully ignored the theory and redefined NFA to mean "something used to match regular expressions in some sort of way, not necessarily finite nor an automaton". NFA algorithms, as defined in the theory and as used in Hyperscan, are linear in the size of the searched string (although they are in practice constant-time more expensive than the DFA implementations used within Hyperscan).

So if you use Hyperscan as-is, you will find that the features that could require exponential behavior from backtracking (actually, I don't think matching back-references is truly exponential for a fixed number of back-references, but that's another argument) are already disallowed and everything should be fine.

Regards,
Geoff Langdale.

On Thu, Nov 8, 2018 at 10:09 AM Bobby Martin <bobbymartin2@gmail.com> wrote:
Is there a way to configure Hyperscan to use only DFA state machines, or otherwise restrict the time complexity of matches to be polynomial at worst?

We need to ensure the algorithmic complexity of our matches doesn't go exponential in the size of the searched string, as NFA can for degenerate cases

We understand that this is incompatible with some regular expression features such as backreferences.

Thanks!
Bobby