Assuming N is the input length and M is the total 'pattern length' (some
combination of pattern size and count).
I don't think there's a clean expression of this for compile; informally,
we tried to avoid M^2 behavior at compile time but I imagine there are
still corner cases for some pattern constructs that might blow up in that
fashion (hopefully not worse).
I'm pretty such that runtime is O(NM) - with some pretty 'hairy' variations
in constant and additive factors depending on pattern type and input
constants. Very early versions of Hyperscan (the paleolithic Hyperscan 1.0
- "you had to be there, but be glad you weren't") were potentially O(N^2*M)
due to the fact that they scanned outwards from each literal 'factor'
match. I think we worked pretty hard to not replicate that.
I am not currently working on Hyperscan, but I doubt that this behavior has
changed from the 4.x to the 5.x release series.
Regards,
Geoff.
On Fri, Feb 1, 2019 at 12:23 PM Bobby Martin <bobbymartin2(a)gmail.com> wrote:
Thank you!
Next question: what are the bounds on time and space complexity of
hs_compile and hs_scan in the length of the regular expression pattern?
On Wed, Nov 7, 2018 at 3:32 PM Geoff Langdale <geoff.langdale(a)gmail.com>
wrote:
> Hyperscan doesn't go exponential in the size of the searched string at
> any time. It also doesn't handle things that require backtracking (e.g.
> backreferences).
>
> This question arises from a common confusion. Irritatingly, various
> people (Friedl springs to mind) have willfully ignored the theory and
> redefined NFA to mean "something used to match regular expressions in some
> sort of way, not necessarily finite nor an automaton". NFA algorithms, as
> defined in the theory and as used in Hyperscan, are linear in the size of
> the searched string (although they are in practice constant-time more
> expensive than the DFA implementations used within Hyperscan).
>
> So if you use Hyperscan as-is, you will find that the features that could
> require exponential behavior from backtracking (actually, I don't think
> matching back-references is truly exponential for a fixed number of
> back-references, but that's another argument) are already disallowed and
> everything should be fine.
>
> Regards,
> Geoff Langdale.
>
> On Thu, Nov 8, 2018 at 10:09 AM Bobby Martin <bobbymartin2(a)gmail.com>
> wrote:
>
>> Is there a way to configure Hyperscan to use only DFA state machines, or
>> otherwise restrict the time complexity of matches to be polynomial at worst?
>>
>> We need to ensure the algorithmic complexity of our matches doesn't go
>> exponential in the size of the searched string, as NFA can for degenerate
>> cases
>>
>> We understand that this is incompatible with some regular expression
>> features such as backreferences.
>>
>> Thanks!
>> Bobby
>>
>