Table of Contents
Fetching ...

Random Deterministic Automata With One Added Transition

Arnaud Carayol, Philippe Duchon, Florent Koechlin, Cyril Nicaud

TL;DR

This paper investigates the state complexity of languages recognized by random almost deterministic automata obtained by adding a single random transition to a uniform random n-state deterministic automaton. It establishes that for any fixed d≥1, there exists a positive probability that the minimal DFA recognizing the resulting language has more than $n^d$ states, implying the expected state complexity grows faster than any polynomial. The proof develops a probabilistic framework built on backward substructures, forward trees, and b-threads, and reduces key calculations to Galton–Watson processes with Poisson(2) offspring, facilitated by the novel template formalism. The results show a non-negligible propensity for combinatorial explosion in the powerset construction even under minimal non-determinism, and they provide insight into the distribution of regular languages induced by random deterministic automata.

Abstract

Every language recognized by a non-deterministic finite automaton can be recognized by a deterministic automaton, at the cost of a potential increase of the number of states, which in the worst case can go from $n$ states to $2^n$ states. In this article, we investigate this classical result in a probabilistic setting where we take a deterministic automaton with $n$ states uniformly at random and add just one random transition. These automata are almost deterministic in the sense that only one state has a non-deterministic choice when reading an input letter. In our model, each state has a fixed probability to be final. We prove that for any $d\geq 1$, with non-negligible probability the minimal (deterministic) automaton of the language recognized by such an automaton has more than $n^d$ states; as a byproduct, the expected size of its minimal automaton grows faster than any polynomial. Our result also holds when each state is final with some probability that depends on $n$, as long as it is not too close to $0$ and $1$, at distance at least $Ω(\frac1{\sqrt{n}})$ to be precise, therefore allowing models with a sublinear number of final states in expectation.

Random Deterministic Automata With One Added Transition

TL;DR

This paper investigates the state complexity of languages recognized by random almost deterministic automata obtained by adding a single random transition to a uniform random n-state deterministic automaton. It establishes that for any fixed d≥1, there exists a positive probability that the minimal DFA recognizing the resulting language has more than states, implying the expected state complexity grows faster than any polynomial. The proof develops a probabilistic framework built on backward substructures, forward trees, and b-threads, and reduces key calculations to Galton–Watson processes with Poisson(2) offspring, facilitated by the novel template formalism. The results show a non-negligible propensity for combinatorial explosion in the powerset construction even under minimal non-determinism, and they provide insight into the distribution of regular languages induced by random deterministic automata.

Abstract

Every language recognized by a non-deterministic finite automaton can be recognized by a deterministic automaton, at the cost of a potential increase of the number of states, which in the worst case can go from states to states. In this article, we investigate this classical result in a probabilistic setting where we take a deterministic automaton with states uniformly at random and add just one random transition. These automata are almost deterministic in the sense that only one state has a non-deterministic choice when reading an input letter. In our model, each state has a fixed probability to be final. We prove that for any , with non-negligible probability the minimal (deterministic) automaton of the language recognized by such an automaton has more than states; as a byproduct, the expected size of its minimal automaton grows faster than any polynomial. Our result also holds when each state is final with some probability that depends on , as long as it is not too close to and , at distance at least to be precise, therefore allowing models with a sublinear number of final states in expectation.
Paper Structure (19 sections, 26 theorems, 68 equations, 5 figures)

This paper contains 19 sections, 26 theorems, 68 equations, 5 figures.

Key Result

Proposition 2.1

If there is a set of accessible states $X$ in a deterministic automaton $\mathcal{A}$ such that the states of $X$ are pairwise non-equivalent, then $\mathcal{A}$ has state complexity at least $|X|$.

Figures (5)

  • Figure 1: On the left, a non-deterministic automaton with $n$ states recognizing the language ${\mathcal{L}}_\ell = \Sigma^*a\Sigma^{n-2}$. On the right, a non-deterministic automaton with $n$ states recognizing the language ${\mathcal{L}}_r = \Sigma^*a^{n-1}$. The minimal automaton of ${\mathcal{L}}_\ell$ has $2^{n-1}$ states, whereas the one of ${\mathcal{L}}_r$ has $n$ states.
  • Figure 2: Illustration of the proof sketch of Theorem \ref{['th:main']}. On the left, the $h$-backward substructure from $p$ that is detailed in Section \ref{['sec:backward']}. It has size $\Theta(\sqrt{n})$ and contains $\Theta(\sqrt{n})$ extremal leaves (i.e. leaves in its last level $h$) to be valid. In the center, the forward tree from $r$, described in Section \ref{['sec:forward']}; it is a breadth-first traversal that is valid if it hits an extremal leaf of the backward substructure before $\sqrt{n}$ states are examined. On the right the $b$-threads introduced in Section \ref{['sec:b-cycles']}, obtained by reading $b$'s from the $p_i$'s; they are valid if they are made of previously unseen states and are pairwise disjoint until they cycle back on themselves, forming a $b$-cycle of length in $[\![\sqrt{n},2\sqrt{n}\,]\!]$.
  • Figure 3: On the left, a backward multi-tree which is not a backward tree because, for instance, the state 2 labels two nodes. In the center, a backward multi-tree which is a tree with its corresponding backward substructure on the right.
  • Figure 4: An illustration for the values of $r$, $s$ and $z$ in the proof of Lemma \ref{['lm:lower bound proba']}.
  • Figure 5: On the left, two primitive $b$-cycles (accepting states are denoted by double circles) whose associated words are $0011$ (top) and $001$ (bottom), starting at $0$ and $\alpha$, respectively. On the right, the $b$-cycle of $\{0,\alpha\}$ of associated word $0011\odot 001 = 001101111011$, which is primitive by Lemma \ref{['lm:primitive product']}.

Theorems & Definitions (45)

  • Proposition 2.1
  • Theorem 3.1
  • Corollary 3.2
  • proof
  • Lemma 4.1
  • Lemma 5.1
  • proof
  • Lemma 5.2
  • proof
  • Theorem 5.3
  • ...and 35 more