[Note: This comment tree is about the Redis code, which uses glob-style patterns rather than regex. So in this comment and in my grandparent comment, "*" means "any string" rather than "zero or more of the previous pattern".]
An NFA / DFA would be a perfectly valid way to implement an efficient matcher, but it's not the only way.
For example, memoization (https://news.ycombinator.com/item?id=32434412#32436442) can be easily shown to have time complexity O(P * T) where P = len(pattern) and T = len(text), which is the same as a basic NFA implementation.
In the case of the Redis code, my proposed fix requires much less code and memory than memoization or an NFA, and has time complexity O(P * T). It's slightly harder to prove this bound; here is a sketch of the proof:
Matching against a pattern fragment that does not contain "*" takes O(len(pattern_fragment)).
When a call to stringmatchlen() encounters a wildcard, it may recurse O(T) times (and then it returns). Each recursive call can be classified either "leaf" (if there is a match failure before the next wildcard) or "full" (if the match reaches the next wildcard).
The insight I described leads to at most one "full" call at each level of the recursion (i.e. for each wildcard in the pattern). So for a particular pattern fragment, the number of "leaf" calls that process it is O(T). Hence the overall time complexity is O(P * T).
Although its space complexity is P*T, which limits its use to small regexes/haystacks. It sounds like your claim is that your technique achieves a smaller space bound?
Yes. My proposed fix takes O(P * T) time and O(1) memory (but it only supports glob-style wildcards; it doesn't support repetition of specific characters / character classes / sub-patterns).
An NFA / DFA would be a perfectly valid way to implement an efficient matcher, but it's not the only way.
For example, memoization (https://news.ycombinator.com/item?id=32434412#32436442) can be easily shown to have time complexity O(P * T) where P = len(pattern) and T = len(text), which is the same as a basic NFA implementation.
In the case of the Redis code, my proposed fix requires much less code and memory than memoization or an NFA, and has time complexity O(P * T). It's slightly harder to prove this bound; here is a sketch of the proof:
Matching against a pattern fragment that does not contain "*" takes O(len(pattern_fragment)).
When a call to stringmatchlen() encounters a wildcard, it may recurse O(T) times (and then it returns). Each recursive call can be classified either "leaf" (if there is a match failure before the next wildcard) or "full" (if the match reaches the next wildcard).
The insight I described leads to at most one "full" call at each level of the recursion (i.e. for each wildcard in the pattern). So for a particular pattern fragment, the number of "leaf" calls that process it is O(T). Hence the overall time complexity is O(P * T).