Table of Contents
Fetching ...

Active Learning of Symbolic Automata Over Rational Numbers

Sebastian Hagedorn, Martín Muñoz, Cristian Riveros, Rodrigo Toro Icarte

TL;DR

This work extends Angluin's L^* algorithm to learn symbolic automata over the rational numbers by embedding a MAT framework that handles infinite alphabets. The core idea learns finite piecewise functions over $\,\mathbb{Q}\, $ using a Stern–Brocot based interval learning scheme, and then integrates this into the L^* style framework to learn Symbolic Finite Automata with inequality predicates. The authors prove that the resulting learning process uses a linear number of membership and equivalence queries in the size of the target representation, achieving optimal query efficiency and removing restrictions on counterexample forms. The approach applies to practical settings such as RGX and time-series analysis, and it leverages break links and convergents in the Stern–Brocot tree to efficiently identify interval endpoints. Overall, the paper delivers a theoretically solid, query-efficient method for learning SFAs over dense rational alphabets with broad applicability in AI and software engineering.

Abstract

Automata learning has many applications in artificial intelligence and software engineering. Central to these applications is the $L^*$ algorithm, introduced by Angluin. The $L^*$ algorithm learns deterministic finite-state automata (DFAs) in polynomial time when provided with a minimally adequate teacher. Unfortunately, the $L^*$ algorithm can only learn DFAs over finite alphabets, which limits its applicability. In this paper, we extend $L^*$ to learn symbolic automata whose transitions use predicates over rational numbers, i.e., over infinite and dense alphabets. Our result makes the $L^*$ algorithm applicable to new settings like (real) RGX, and time series. Furthermore, our proposed algorithm is optimal in the sense that it asks a number of queries to the teacher that is at most linear with respect to the number of transitions, and to the representation size of the predicates.

Active Learning of Symbolic Automata Over Rational Numbers

TL;DR

This work extends Angluin's L^* algorithm to learn symbolic automata over the rational numbers by embedding a MAT framework that handles infinite alphabets. The core idea learns finite piecewise functions over using a Stern–Brocot based interval learning scheme, and then integrates this into the L^* style framework to learn Symbolic Finite Automata with inequality predicates. The authors prove that the resulting learning process uses a linear number of membership and equivalence queries in the size of the target representation, achieving optimal query efficiency and removing restrictions on counterexample forms. The approach applies to practical settings such as RGX and time-series analysis, and it leverages break links and convergents in the Stern–Brocot tree to efficiently identify interval endpoints. Overall, the paper delivers a theoretically solid, query-efficient method for learning SFAs over dense rational alphabets with broad applicability in AI and software engineering.

Abstract

Automata learning has many applications in artificial intelligence and software engineering. Central to these applications is the algorithm, introduced by Angluin. The algorithm learns deterministic finite-state automata (DFAs) in polynomial time when provided with a minimally adequate teacher. Unfortunately, the algorithm can only learn DFAs over finite alphabets, which limits its applicability. In this paper, we extend to learn symbolic automata whose transitions use predicates over rational numbers, i.e., over infinite and dense alphabets. Our result makes the algorithm applicable to new settings like (real) RGX, and time series. Furthermore, our proposed algorithm is optimal in the sense that it asks a number of queries to the teacher that is at most linear with respect to the number of transitions, and to the representation size of the predicates.

Paper Structure

This paper contains 30 sections, 12 theorems, 22 equations, 2 figures, 7 algorithms.

Key Result

Theorem 1

For the class $\mathcal{C}$ of finite piecewise functions, there exists a learning algorithm that learns an unknown concept $\gamma \in \mathcal{C}$ using MAT with a linear number of membership and equivalence queries over $\operatorname{size}(\gamma)$.

Figures (2)

  • Figure 1: Two SFA with inequality formulas. (1) is the SFA of a RGX and (2) is the SFA of a time series pattern.
  • Figure 2: Top: The finite piecewise function $\gamma_1$ (see Section \ref{['sec:setting']}) shown in a Stern–Brocot tree up to depth 4. Nodes $q$ for which $\gamma(q)=A$ (resp. $B$) are represented by gray (resp. white) blocks. Bottom: The canonical representation of $\gamma_1$.

Theorems & Definitions (14)

  • Theorem 1
  • Theorem 2: Theorem 1 in 10.1007/978-3-319-96145-3_23
  • Corollary 1
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Theorem 3
  • Proposition 4
  • Theorem 4
  • Theorem 5
  • ...and 4 more