Table of Contents
Fetching ...

An $\mathbf{L^*}$ Algorithm for Deterministic Weighted Regular Languages

Clemente Pasti, Talu Karagöz, Anej Svete, Franz Nowak, Reda Boumasmoud, Ryan Cotterell

TL;DR

This work extends Angluin's $L^*$ to weighted regular languages by learning deterministic WFSA over semifields using empirical Hankel systems. The method preserves the spirit of the original algorithm, providing exact learning via a sequence of closed and consistent Hankel matrices and a quotient construction that yields a minimal, deterministic automaton. Theoretical results guarantee termination with a minimal automaton and establish a polynomial-time bound in terms of the target automaton size and the longest counterexample. Practically, the approach enables interpretable, weight-bearing models of language behavior and offers a principled framework for analyzing probabilistic and cost-annotated processes. The work also clarifies the link between weighted automata minimization and the L*-style learning paradigm, highlighting both capabilities and limitations when weights are real-valued or drawn from a semifield.

Abstract

Extracting finite state automata (FSAs) from black-box models offers a powerful approach to gaining interpretable insights into complex model behaviors. To support this pursuit, we present a weighted variant of Angluin's (1987) $\mathbf{L^*}$ algorithm for learning FSAs. We stay faithful to the original algorithm, devising a way to exactly learn deterministic weighted FSAs whose weights support division. Furthermore, we formulate the learning process in a manner that highlights the connection with FSA minimization, showing how $\mathbf{L^*}$ directly learns a minimal automaton for the target language.

An $\mathbf{L^*}$ Algorithm for Deterministic Weighted Regular Languages

TL;DR

This work extends Angluin's to weighted regular languages by learning deterministic WFSA over semifields using empirical Hankel systems. The method preserves the spirit of the original algorithm, providing exact learning via a sequence of closed and consistent Hankel matrices and a quotient construction that yields a minimal, deterministic automaton. Theoretical results guarantee termination with a minimal automaton and establish a polynomial-time bound in terms of the target automaton size and the longest counterexample. Practically, the approach enables interpretable, weight-bearing models of language behavior and offers a principled framework for analyzing probabilistic and cost-annotated processes. The work also clarifies the link between weighted automata minimization and the L*-style learning paradigm, highlighting both capabilities and limitations when weights are real-valued or drawn from a semifield.

Abstract

Extracting finite state automata (FSAs) from black-box models offers a powerful approach to gaining interpretable insights into complex model behaviors. To support this pursuit, we present a weighted variant of Angluin's (1987) algorithm for learning FSAs. We stay faithful to the original algorithm, devising a way to exactly learn deterministic weighted FSAs whose weights support division. Furthermore, we formulate the learning process in a manner that highlights the connection with FSA minimization, showing how directly learns a minimal automaton for the target language.

Paper Structure

This paper contains 24 sections, 8 theorems, 10 equations, 2 algorithms.

Key Result

theorem 1

Let ${{\overline{{{{ \mathbf{H}}}}}}} = ({{{\mathrm{P}}}}, {{{\mathrm{S}}}}, {{{{{ \mathbf{H}}}}}})$ be an empirical Hankel system. The equivalence relation $\sim_{{{{{ \mathbf{H}}}}}}$ on ${{{ \mathcal{A}}}_{{{{{{ \mathbf{H}}}}}}}}$ is transition-regular (see transreg), which means that for every $

Theorems & Definitions (17)

  • definition 1
  • theorem 1: $\naivehankelautomaton$ is transition-regular
  • proof
  • theorem 2: The empirical Hankel Automaton $\emphankelautomaton$
  • proof
  • theorem 3: Minimality of $\emphankelautomaton$
  • proof
  • corollary 1: Termination
  • theorem 4
  • proof
  • ...and 7 more