Table of Contents
Fetching ...

Indexing Finite-State Automata Using Forward-Stable Partitions

Ruben Becker, Sung-Hwan Kim, Nicola Prezza, Carlo Tosoni

TL;DR

This work tackles the problem of efficiently indexing languages recognized by nondeterministic finite automata (NFAs) by leveraging forward-stable partitions, generalizing Wheeler-like approaches to width greater than one. It introduces coarsest forward-stable co-lex (CFS) orders, proving their existence and uniqueness and showing they can be computed in $O(|\delta|^{2})$ time; crucially, the CFS width never exceeds the width of the maximal co-lex relation, and in some automata families it is asymptotically smaller. The authors also establish that the quotient $\mathcal{A}/_{FS}$ always admits a maximum co-lex order, and that the CFS framework yields a more compact state representation than previous max-co-lex-based quotients. Overall, this provides a scalable, general method to construct efficient indices for arbitrary NFAs, enabling FM-index–style pattern queries beyond Wheeler NFAs and potentially improving index size and query performance.

Abstract

An index on a finite-state automaton is a data structure able to locate specific patterns on the automaton's paths and consequently on the regular language accepted by the automaton itself. Cotumaccio and Prezza [SODA '21], introduced a data structure able to solve pattern matching queries on automata, generalizing the famous FM-index for strings of Ferragina and Manzini [FOCS '00]. The efficiency of their index depends on the width of a particular partial order of the automaton's states, the smaller the width of the partial order, the faster is the index. However, computing the partial order of minimal width is NP-hard. This problem was mitigated by Cotumaccio [DCC '22], who relaxed the conditions on the partial order, allowing it to be a partial preorder. This relaxation yields the existence of a unique partial preorder of minimal width that can be computed in polynomial time. In the paper at hand, we present a new class of partial preorders and show that they have the following useful properties: (i) they can be computed in polynomial time, (ii) their width is never larger than the width of Cotumaccio's preorders, and (iii) there exist infinite classes of automata on which the width of Cotumaccio's pre-order is linearly larger than the width of our preorder.

Indexing Finite-State Automata Using Forward-Stable Partitions

TL;DR

This work tackles the problem of efficiently indexing languages recognized by nondeterministic finite automata (NFAs) by leveraging forward-stable partitions, generalizing Wheeler-like approaches to width greater than one. It introduces coarsest forward-stable co-lex (CFS) orders, proving their existence and uniqueness and showing they can be computed in time; crucially, the CFS width never exceeds the width of the maximal co-lex relation, and in some automata families it is asymptotically smaller. The authors also establish that the quotient always admits a maximum co-lex order, and that the CFS framework yields a more compact state representation than previous max-co-lex-based quotients. Overall, this provides a scalable, general method to construct efficient indices for arbitrary NFAs, enabling FM-index–style pattern queries beyond Wheeler NFAs and potentially improving index size and query performance.

Abstract

An index on a finite-state automaton is a data structure able to locate specific patterns on the automaton's paths and consequently on the regular language accepted by the automaton itself. Cotumaccio and Prezza [SODA '21], introduced a data structure able to solve pattern matching queries on automata, generalizing the famous FM-index for strings of Ferragina and Manzini [FOCS '00]. The efficiency of their index depends on the width of a particular partial order of the automaton's states, the smaller the width of the partial order, the faster is the index. However, computing the partial order of minimal width is NP-hard. This problem was mitigated by Cotumaccio [DCC '22], who relaxed the conditions on the partial order, allowing it to be a partial preorder. This relaxation yields the existence of a unique partial preorder of minimal width that can be computed in polynomial time. In the paper at hand, we present a new class of partial preorders and show that they have the following useful properties: (i) they can be computed in polynomial time, (ii) their width is never larger than the width of Cotumaccio's preorders, and (iii) there exist infinite classes of automata on which the width of Cotumaccio's pre-order is linearly larger than the width of our preorder.
Paper Structure (19 sections, 20 theorems, 3 figures)

This paper contains 19 sections, 20 theorems, 3 figures.

Key Result

lemma thmcounterlemma

cotumaccio2022graphs Let $\leq_{R}$ be the maximum co-lex relation of an automaton $\mathcal{A}=(Q,\delta,\Sigma,s)$, and let $\mathcal{A}/_{\sim_{R}}=(Q/_{\sim_{R}},\delta/_{\sim_{R}},\Sigma,s/_{\sim_{R}})$ be the quotient automaton of $\mathcal{A}$ defined by $\leq_{R}$. Then the partial order $\l

Figures (3)

  • Figure 1: An NFA $\mathcal{A}$ is input consistent if for each state $u$ in $\mathcal{A}$, all incoming edges of $u$ are labeled with the same character. We show the connections between the different relations described. A relation is orange, if every automaton always admits an instance of that relation, and it is blue otherwise. Orders on the left, i.e., Wheeler preorders and Wheeler orders, are of width 1, while the others may be of arbitrary width. A blue edge $A \rightarrow B$ means that any relation of type $A$ has always a width larger than or equal to a relation of type $B$. A black edge $A \xrightarrow{c} B$ means a relation of type $A$ is also a relation of type $B$ if it satisfies the requirements $c$. In this case, a relation of type $B$ is always also a relation of type $A$ with the following exceptions: (i) The coarsest forward-stable co-lex order may not be equal to the maximum co-lex relation. (ii) If the maximum co-lex order exists, then it may not be equal to the coarsest forward-stable co-lex order. (iii) A Wheeler order may not be equal to the Wheeler preorder. All implications either directly follow from their definitions or are proved in Appendix \ref{['appendix: relationships']}.
  • Figure 2: An NFA $\mathcal{A}=(Q,\delta,\Sigma,s)$ on the left. On the right, the corresponding quotient automaton $\mathcal{A}/_{\sim_{FS}}=(Q/_{\sim_{FS}},\delta/_{\sim_{FS}},\Sigma,s_{\sim_{FS}})$ of $\mathcal{A}$ for the coarsest forward-stable partition $Q/_{\sim_{FS}} = \{ \{u_{0}\}, \{u_{1}, u_{2}\}, \{u_{3}, u_{4}\}, \{u_{5}, u_{6}\} \}$.
  • Figure 3: (a) An NFA $\mathcal{A}=(Q,\delta,\Sigma,s)$ with $\Sigma = \{a,b\}$, $Q=\{u_{1},...,u_{n}\}$, with $n > 4$, where $u_{1}=s$ is the initial state, and $\delta$ is s.t. $u_{2} \in \delta_{a}(u_{1})$, $u_{3} \in \delta_{b}(u_{1})$, $u_{4}\ \in \delta_{b}(u_{3})$ and for each $4 < i \leq n$, $u_{i} \in \delta_{a}(u_{2})$, $u_{i} \in \delta_{a}(u_{3})$. We observe that $\mathcal{A}/_{\sim_{R}}$ is equal to $\mathcal{A}$ itself. Note that the example is not trivial, as states $u_{2}$ and $u_{3}$ may not be merged without changing the language of $\mathcal{A}$ (due to state $u_{4}$). (b) The Hasse diagram of the maximum co-lex order of $\mathcal{A}/_{\sim_{R}}$, where $\{u_{5},...,u_{n}\}$ forms a largest antichain, consequently the co-lexicographic width of $\mathcal{A}/_{\sim_{R}}$ is equal to $n-4$. (c) The automaton $\mathcal{A}/_{\sim_{FS}}$ consisting of five states, where for each $4 < i \leq n$, $u_{i} \in [u_{n}]_{\sim_{FS}}$. (d) The Hasse diagram of the maximum co-lex order of $\mathcal{A}/_{\sim_{FS}}$. Note that this total order is also a Wheeler order of $\mathcal{A}/_{\sim_{FS}}$, i.e., the co-lexicographic width of $\mathcal{A}/_{\sim_{FS}}$ is equal to 1.

Theorems & Definitions (48)

  • definition thmcounterdefinition: Strings reaching a state
  • definition thmcounterdefinition: Forward-Stability
  • definition thmcounterdefinition: Quotient automaton
  • definition thmcounterdefinition: Wheeler NFAs
  • definition thmcounterdefinition: Quasi-Wheeler NFAs
  • definition thmcounterdefinition: Co-lex orders
  • definition thmcounterdefinition: Co-lexicographic width
  • definition thmcounterdefinition: Indexable partial preorders
  • definition thmcounterdefinition: Co-lex relations
  • lemma thmcounterlemma
  • ...and 38 more