An $\mathbf{L^*}$ Algorithm for Deterministic Weighted Regular Languages
Clemente Pasti, Talu Karagöz, Anej Svete, Franz Nowak, Reda Boumasmoud, Ryan Cotterell
TL;DR
This work extends Angluin's $L^*$ to weighted regular languages by learning deterministic WFSA over semifields using empirical Hankel systems. The method preserves the spirit of the original algorithm, providing exact learning via a sequence of closed and consistent Hankel matrices and a quotient construction that yields a minimal, deterministic automaton. Theoretical results guarantee termination with a minimal automaton and establish a polynomial-time bound in terms of the target automaton size and the longest counterexample. Practically, the approach enables interpretable, weight-bearing models of language behavior and offers a principled framework for analyzing probabilistic and cost-annotated processes. The work also clarifies the link between weighted automata minimization and the L*-style learning paradigm, highlighting both capabilities and limitations when weights are real-valued or drawn from a semifield.
Abstract
Extracting finite state automata (FSAs) from black-box models offers a powerful approach to gaining interpretable insights into complex model behaviors. To support this pursuit, we present a weighted variant of Angluin's (1987) $\mathbf{L^*}$ algorithm for learning FSAs. We stay faithful to the original algorithm, devising a way to exactly learn deterministic weighted FSAs whose weights support division. Furthermore, we formulate the learning process in a manner that highlights the connection with FSA minimization, showing how $\mathbf{L^*}$ directly learns a minimal automaton for the target language.
