How to detect if the state is in a partial (prefix) match state? #1065
-
Hi, Thanks for the awesome For example, assuming we have a regex pattern We were able to do that with the following logic in if self.dfa.is_dead_state(state) {
return Some(false); // dead state
} else if self.dfa.is_match_state(state) {
return Some(true); // match state
}
None // partially match state (match the prefix) In the new API of use anyhow::Result;
#[test]
fn test_dense_dfa() -> Result<()> {
use regex_automata::{
dfa::{dense, Automaton},
Input,
};
let dfa = dense::DFA::new(r"a/t\d+/")?;
let haystack = "a/";
let mut state = dfa.start_state_forward(&Input::new(haystack))?;
for &b in haystack.as_bytes().iter() {
state = dfa.next_state(state, b);
}
state = dfa.next_eoi_state(state);
assert!(dfa.is_dead_state(state));
Ok(())
}
#[test]
fn test_lazy_dfa() -> Result<()> {
use regex_automata::{hybrid::dfa::DFA, Input};
let dfa = DFA::new(r"a/t\d+/")?;
let mut cache = dfa.create_cache();
let haystack = "a/".as_bytes();
let mut sid = dfa.start_state_forward(&mut cache, &Input::new(haystack))?;
for &b in haystack {
sid = dfa.next_state(&mut cache, sid, b)?;
}
sid = dfa.next_eoi_state(&mut cache, sid)?;
assert!(sid.is_dead());
Ok(())
} |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 8 replies
-
Why do you think the EOI transition is necessary here? You should only call it if you know you've reached the end of input. If you're looking for partial matches, then how do you know when you've reached the end of input? The EOI transition will of course lead to a dead state if it wouldn't otherwise be a match because there is no other transition that can follow it. By invoking the EOI transition you are saying, "there is no more haystack to search." But that doesn't seem to be true. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the prompt reply.
Sorry for the confusion, our use case is: in order to avoid unnecessary directory scans, we first check if a directory path is a potential match against the pattern, then decide if we need to look into (scan) that directory and do further checks (e.g. check inputs let dfa = dense::DFA::new(r"a/t\d+/")?;
let haystack = "a/"; In order to decide if we need to look into the directory path
When checking directory path match self.match_prefix("a/") {
Some(true) => {
// match state: DirectoryMatch::Everything
}
Some(false) => {
// dead state: DirectoryMatch::Nothing,
}
None => {
// match prefix: DirectoryMatch::ShouldTraverse,
for path in fs::read_dir("a/").unwarp() {
// do separate search against paths inside `a/`
self.match_prefix(path)
}
}
}; |
Beta Was this translation helpful? Give feedback.
Yes, that's correct, because it looks like you're doing an unanchored search. So that
a/t\d/
can match anywhere. TryInput::new(haystack).anchored(Anchored::Yes)
.