Indexing Finite-State Automata Using Forward-Stable Partitions
Ruben Becker, Sung-Hwan Kim, Nicola Prezza, Carlo Tosoni
TL;DR
This work tackles the problem of efficiently indexing languages recognized by nondeterministic finite automata (NFAs) by leveraging forward-stable partitions, generalizing Wheeler-like approaches to width greater than one. It introduces coarsest forward-stable co-lex (CFS) orders, proving their existence and uniqueness and showing they can be computed in $O(|\delta|^{2})$ time; crucially, the CFS width never exceeds the width of the maximal co-lex relation, and in some automata families it is asymptotically smaller. The authors also establish that the quotient $\mathcal{A}/_{FS}$ always admits a maximum co-lex order, and that the CFS framework yields a more compact state representation than previous max-co-lex-based quotients. Overall, this provides a scalable, general method to construct efficient indices for arbitrary NFAs, enabling FM-index–style pattern queries beyond Wheeler NFAs and potentially improving index size and query performance.
Abstract
An index on a finite-state automaton is a data structure able to locate specific patterns on the automaton's paths and consequently on the regular language accepted by the automaton itself. Cotumaccio and Prezza [SODA '21], introduced a data structure able to solve pattern matching queries on automata, generalizing the famous FM-index for strings of Ferragina and Manzini [FOCS '00]. The efficiency of their index depends on the width of a particular partial order of the automaton's states, the smaller the width of the partial order, the faster is the index. However, computing the partial order of minimal width is NP-hard. This problem was mitigated by Cotumaccio [DCC '22], who relaxed the conditions on the partial order, allowing it to be a partial preorder. This relaxation yields the existence of a unique partial preorder of minimal width that can be computed in polynomial time. In the paper at hand, we present a new class of partial preorders and show that they have the following useful properties: (i) they can be computed in polynomial time, (ii) their width is never larger than the width of Cotumaccio's preorders, and (iii) there exist infinite classes of automata on which the width of Cotumaccio's pre-order is linearly larger than the width of our preorder.
