Analyzing Robustness of Angluin's L$^*$ Algorithm in Presence of Noise
Lina Ye, Igor Khmelnitsky, Serge Haddad, Benoît Barbot, Benedikt Bollig, Martin Leucker, Daniel Neider, Rajarshi Roy
TL;DR
This paper studies how Angluin's PAC variant of the L$^*$ algorithm (KV's algorithm) behaves when the access device is perturbed by noise. It introduces four noise models (random output, random input, counter DFA, and pathological behaviours) and evaluates robustness through extensive experiments on hundreds of randomly generated DFAs, paired with a theoretical analysis of randomness versus structure. The empirical results show robust learning under random noise but failure under structured noise, while the theory proves that random perturbations make the learned language almost surely non-recursively enumerable. The authors also propose a practical size-reduction strategy to counter overfitting and discuss implications for learning from noisy black-box devices in real-world settings.
Abstract
Angluin's L$^*$ algorithm learns the minimal deterministic finite automaton (DFA) of a regular language using membership and equivalence queries. Its probabilistic approximatively correct (PAC) version substitutes an equivalence query by numerous random membership queries to get a high level confidence to the answer. Thus it can be applied to any kind of device and may be viewed as an algorithm for synthesizing an automaton abstracting the behavior of the device based on observations. Here we are interested on how Angluin's PAC learning algorithm behaves for devices which are obtained from a DFA by introducing some noise. More precisely we study whether Angluin's algorithm reduces the noise and produces a DFA closer to the original one than the noisy device. We propose several ways to introduce the noise: (1) the noisy device inverts the classification of words w.r.t. the DFA with a small probability, (2) the noisy device modifies with a small probability the letters of the word before asking its classification w.r.t. the DFA, (3) the noisy device combines the classification of a word w.r.t. the DFA and its classification w.r.t. a counter automaton, and (4) the noisy DFA is obtained by a random process from two DFA such that the language of the first one is included in the second one. Then when a word is accepted (resp. rejected) by the first (resp. second) one, it is also accepted (resp. rejected) and in the remaining cases, it is accepted with probability 0.5. Our main experimental contributions consist in showing that: (1) Angluin's algorithm behaves well whenever the noisy device is produced by a random process, (2) but poorly with a structured noise, and, that (3) is able to eliminate pathological behaviours specified in a regular way. Theoretically, we show that randomness almost surely yields systems with non-recursively enumerable languages.
