Extracting Robust Register Automata from Neural Networks over Data Sequences
Chih-Duo Hong, Hongjian Jiang, Anthony W. Lin, Oliver Markgraf, Julian Parsert, Tony Tan
TL;DR
This work develops a framework for extracting robust deterministic register automata (DRAs) from neural networks that operate on data sequences, enabling interpretable, symbolically verifiable surrogates. It introduces two-head RAAs with an accumulator to model perturbation costs and proves that, for a fixed number of registers, robustness of DRA languages can be decided in polynomial time via reductions to coverability and shortest-path problems. The authors provide three learning paradigms (SMT-based, local-search, and active learning) plus a robustness-aware extraction loop that yields DRAs with statistical guarantees, verified through PAC-style equivalence checks and local δ-stability testing. Extensive experiments on RNNs and transformers across 18 languages demonstrate that the approach learns accurate surrogate DRAs and can certify or refute network robustness in a principled, distribution-aware manner. Overall, the framework bridges neural network interpretability and formal reasoning for sequential data, with practical impact on robustness analysis in time-series and related domains.
Abstract
Automata extraction is a method for synthesising interpretable surrogates for black-box neural models that can be analysed symbolically. Existing techniques assume a finite input alphabet, and thus are not directly applicable to data sequences drawn from continuous domains. We address this challenge with deterministic register automata (DRAs), which extend finite automata with registers that store and compare numeric values. Our main contribution is a framework for robust DRA extraction from black-box models: we develop a polynomial-time robustness checker for DRAs with a fixed number of registers, and combine it with passive and active automata learning algorithms. This combination yields surrogate DRAs with statistical robustness and equivalence guarantees. As a key application, we use the extracted automata to assess the robustness of neural networks: for a given sequence and distance metric, the DRA either certifies local robustness or produces a concrete counterexample. Experiments on recurrent neural networks and transformer architectures show that our framework reliably learns accurate automata and enables principled robustness evaluation. Overall, our results demonstrate that robust DRA extraction effectively bridges neural network interpretability and formal reasoning without requiring white-box access to the underlying network.
