Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

[Note: This comment tree is about the Redis code, which uses glob-style patterns rather than regex. So in this comment and in my grandparent comment, "*" means "any string" rather than "zero or more of the previous pattern".]

An NFA / DFA would be a perfectly valid way to implement an efficient matcher, but it's not the only way.

For example, memoization (https://news.ycombinator.com/item?id=32434412#32436442) can be easily shown to have time complexity O(P * T) where P = len(pattern) and T = len(text), which is the same as a basic NFA implementation.

In the case of the Redis code, my proposed fix requires much less code and memory than memoization or an NFA, and has time complexity O(P * T). It's slightly harder to prove this bound; here is a sketch of the proof:

Matching against a pattern fragment that does not contain "*" takes O(len(pattern_fragment)).

When a call to stringmatchlen() encounters a wildcard, it may recurse O(T) times (and then it returns). Each recursive call can be classified either "leaf" (if there is a match failure before the next wildcard) or "full" (if the match reaches the next wildcard).

The insight I described leads to at most one "full" call at each level of the recursion (i.e. for each wildcard in the pattern). So for a particular pattern fragment, the number of "leaf" calls that process it is O(T). Hence the overall time complexity is O(P * T).



See also: https://github.com/google/re2/blob/main/re2/bitstate.cc

Although its space complexity is P*T, which limits its use to small regexes/haystacks. It sounds like your claim is that your technique achieves a smaller space bound?


Yes. My proposed fix takes O(P * T) time and O(1) memory (but it only supports glob-style wildcards; it doesn't support repetition of specific characters / character classes / sub-patterns).


Ah gotya. An important clarification indeed. Thanka for that!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: