Table of Contents
Fetching ...

DeepDFA: Automata Learning through Neural Probabilistic Relaxations

Elena Umili, Roberto Capobianco

TL;DR

DeepDFA presents a differentiable, probabilistic relaxation of DFAs that can be trained with gradient descent and then converted to crisp DFAs via temperature annealing and Hopcroft minimization. It combines the interpretability of classical automata with the scalability and noise-tolerance of neural models, enabling efficient learning from traces even with imperfect grounding. Across Tomita benchmarks and random DFAs, DeepDFA achieves high accuracy with state counts close to the target and outperforms SAT-based and RNN-based baselines, particularly under label and symbol noise. This approach offers a practical, scalable pathway for automatic grammar induction and DFA identification with robust performance in real-world noisy settings.

Abstract

In this work, we introduce DeepDFA, a novel approach to identifying Deterministic Finite Automata (DFAs) from traces, harnessing a differentiable yet discrete model. Inspired by both the probabilistic relaxation of DFAs and Recurrent Neural Networks (RNNs), our model offers interpretability post-training, alongside reduced complexity and enhanced training efficiency compared to traditional RNNs. Moreover, by leveraging gradient-based optimization, our method surpasses combinatorial approaches in both scalability and noise resilience. Validation experiments conducted on target regular languages of varying size and complexity demonstrate that our approach is accurate, fast, and robust to noise in both the input symbols and the output labels of training data, integrating the strengths of both logical grammar induction and deep learning.

DeepDFA: Automata Learning through Neural Probabilistic Relaxations

TL;DR

DeepDFA presents a differentiable, probabilistic relaxation of DFAs that can be trained with gradient descent and then converted to crisp DFAs via temperature annealing and Hopcroft minimization. It combines the interpretability of classical automata with the scalability and noise-tolerance of neural models, enabling efficient learning from traces even with imperfect grounding. Across Tomita benchmarks and random DFAs, DeepDFA achieves high accuracy with state counts close to the target and outperforms SAT-based and RNN-based baselines, particularly under label and symbol noise. This approach offers a practical, scalable pathway for automatic grammar induction and DFA identification with robust performance in real-world noisy settings.

Abstract

In this work, we introduce DeepDFA, a novel approach to identifying Deterministic Finite Automata (DFAs) from traces, harnessing a differentiable yet discrete model. Inspired by both the probabilistic relaxation of DFAs and Recurrent Neural Networks (RNNs), our model offers interpretability post-training, alongside reduced complexity and enhanced training efficiency compared to traditional RNNs. Moreover, by leveraging gradient-based optimization, our method surpasses combinatorial approaches in both scalability and noise resilience. Validation experiments conducted on target regular languages of varying size and complexity demonstrate that our approach is accurate, fast, and robust to noise in both the input symbols and the output labels of training data, integrating the strengths of both logical grammar induction and deep learning.
Paper Structure (41 sections, 6 equations, 3 figures, 15 tables)

This paper contains 41 sections, 6 equations, 3 figures, 15 tables.

Figures (3)

  • Figure 1: a) An example of PFA with three states and two symbols: graph describing the PFA, equivalent representation in matrix form, and produced states and acceptance probabilities while processing the string "ab". b) An example of DFA: graph describing the PFA, equivalent representation in matrix form, and produced states and acceptance probabilities while processing the string "ab". In particular, the DFA in (b) is obtained by the PFA in (a) approximating the matrix representation to the closest one-hot vectors. c) DeepDFA processing the string "ab".
  • Figure 2: Ablation study 1: results on Tomita 5 with different error rates in the training dataset. a) Test accuracy. b) Number of states of predicted DFAs. Ablation study 2: results a random DFA of size 20 and alphabet size 3 varying the state size hyperparameter. c) Test accuracy. d) Predicted number of states. Each box represents 10 experiments performed with different random seeds. The box extends from the lower to upper quartile values of the data, with the line at the median. The whiskers extend from the box to show the range of the data. Flier points represent outliers.
  • Figure 3: Learning DFA from traces composed of imperfectly grounded symbols: test accuracy on the 7 Tomita languages. In blue the results obtained with the extension of DeepDFA to probabilistic symbols, in orange results of DFA-inductor.