A Close Analysis of the Subset Construction
Ivan Baburin, Ryan Cotterell
TL;DR
This work analyzes the determinization problem for NFAs, focusing on the size of the resulting DFA produced by the subset construction and the inherent difficulty of forecasting exponential blow-up. The authors prove that both exact and coarse-grained forecasts of DFA state complexity are PSPACE-hard, and they show the same hardness for predicting the size of the subset automaton. To address this, they introduce subset complexity, an upper bound on the subset construction output that can be efficiently bounded using the cyclicity of the transition graph and the rank of transition matrices, enabling practical estimations. The results unify algebraic and combinatorial perspectives on determinization and yield a tractable criterion for identifying NFAs that can be determinized efficiently via the subset construction, with potential implications for automata tooling and analysis.
Abstract
Given a nondeterministic finite-state automaton (NFA), we aim to estimate the size of an equivalent deterministic finite-state automaton (DFA). We demonstrate that computing the state complexity of an NFA within polynomial precision is PSPACE-hard. Furthermore, we also demonstrate that it is PSPACE-hard to decide whether the classical subset construction will yield an equivalent DFA with an exponential increase in the number of states. This result implies that making any a prior estimate of the running time of the subset construction is inherently difficult. To address this, and to enable forecasting of such exponential blow-up in certain special cases, we introduce the notion of subset complexity, which provides an upper bound on the size of the DFA produced by the subset construction. We show that the subset complexity can be efficiently bounded above using the cyclicity and rank of the transition matrices of the NFA. This yields a sufficient condition for identifying NFAs that can be efficiently determinized via the subset construction.
