Streaming regex parsing with version 0.3.6 #1071
-
I've just tried upgrading The first issue is somewhat minor, but I like the new The second issue leaves me puzzled as to how the API is supposed to work. Previously I'd switch states using The following log should explain things:
I'm using the simplest form of DFA here (forward, not anchored, return earliest match). I'd expect this to return as soon as Now if earliest was false, I'd understand this, since it needs to check if it might be |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 10 replies
-
I believed this is answered in the discussion for this PR: #1031. The TL;DR is that the discussion should give you a path forward with the current API, but it's non-obvious. It should be possible to compute the start state without providing a full As for your other question, it's not particularly clear what's going on because I don't see a haystack and I don't see any code. My best guess is that perhaps you aren't using the end-of-input (EOI) transition? One change from |
Beta Was this translation helpful? Give feedback.
-
Is there an option to get back the old behavior if I don't care about this? That said, I don't see why I'd need to process another byte, since I haven't reached the end of my haystack (and with streaming searches I won't ever reach it in a lot of cases). Considering the input, it should always be clear that the end state is reached, even without processing an additional byte, right? I'd assume with the current version of |
Beta Was this translation helpful? Give feedback.
-
Thanks for pointing that out, my current solution works by always parsing one character more but that might have some sneaky error cases my tests don't cover. I've gotten this to pass all my tests by handling EOI and cutting off the last char (for now), but I noticed that by constructing |
Beta Was this translation helpful? Give feedback.
I believed this is answered in the discussion for this PR: #1031. The TL;DR is that the discussion should give you a path forward with the current API, but it's non-obvious. It should be possible to compute the start state without providing a full
Input
. Instead, there will be a new type that mostly mirrorsInput
, but instead of accepting a full&[u8]
haystack, it will accept a single (optional) …