Scalable Tree-based Register Automata Learning

Simon Dierl; Paul Fiterau-Brostean; Falk Howar; Bengt Jonsson; Konstantinos Sagonas; Fredrik Tåquist

Scalable Tree-based Register Automata Learning

Simon Dierl, Paul Fiterau-Brostean, Falk Howar, Bengt Jonsson, Konstantinos Sagonas, Fredrik Tåquist

TL;DR

This paper tackles the scalability challenge of learning register automata (RA) by introducing $SL^{\\lambda}$, a tree-based learning algorithm that uses a classification tree and short, restricted symbolic suffixes to minimize test and tree-query costs. It formalizes data languages and RA, and develops a learning loop that enforces location, transition, and register consistency while constructing a canonical RA from CT leaves. The authors prove complexity bounds showing improved equivalence-query and membership-query counts over prior approaches, and demonstrate empirical gains on real-world (DTLS) and synthetic RA models. The work advances active automata learning for data-rich systems, enabling scalable modeling of protocols and data-dependent behaviors.

Abstract

Existing active automata learning (AAL) algorithms have demonstrated their potential in capturing the behavior of complex systems (e.g., in analyzing network protocol implementations). The most widely used AAL algorithms generate finite state machine models, such as Mealy machines. For many analysis tasks, however, it is crucial to generate richer classes of models that also show how relations between data parameters affect system behavior. Such models have shown potential to uncover critical bugs, but their learning algorithms do not scale beyond small and well curated experiments. In this paper, we present $SL^λ$, an effective and scalable register automata (RA) learning algorithm that significantly reduces the number of tests required for inferring models. It achieves this by combining a tree-based cost-efficient data structure with mechanisms for computing short and restricted tests. We have implemented $SL^λ$ as a new algorithm in RALib. We evaluate its performance by comparing it against $SL^*$, the current state-of-the-art RA learning algorithm, in a series of experiments, and show superior performance and substantial asymptotic improvements in bigger systems.

Scalable Tree-based Register Automata Learning

TL;DR

This paper tackles the scalability challenge of learning register automata (RA) by introducing

, a tree-based learning algorithm that uses a classification tree and short, restricted symbolic suffixes to minimize test and tree-query costs. It formalizes data languages and RA, and develops a learning loop that enforces location, transition, and register consistency while constructing a canonical RA from CT leaves. The authors prove complexity bounds showing improved equivalence-query and membership-query counts over prior approaches, and demonstrate empirical gains on real-world (DTLS) and synthetic RA models. The work advances active automata learning for data-rich systems, enabling scalable modeling of protocols and data-dependent behaviors.

Abstract

, an effective and scalable register automata (RA) learning algorithm that significantly reduces the number of tests required for inferring models. It achieves this by combining a tree-based cost-efficient data structure with mechanisms for computing short and restricted tests. We have implemented

as a new algorithm in RALib. We evaluate its performance by comparing it against

, the current state-of-the-art RA learning algorithm, in a series of experiments, and show superior performance and substantial asymptotic improvements in bigger systems.

Paper Structure (10 sections, 2 theorems, 1 equation, 8 figures, 1 table, 3 algorithms)

This paper contains 10 sections, 2 theorems, 1 equation, 8 figures, 1 table, 3 algorithms.

Introduction
Main Ideas
Data Languages and Register Automata
The $SL^\lambda$ Learning Algorithm
Correctness and Complexity
Evaluation
Conclusion
Example: Symmetries in an SDT
Imposing Restrictions on Symbolic Suffixes
Correctness and Complexity Proofs

Key Result

lemma thmcounterlemma

A counterexample leads to a new short prefix or to a new prefix.

Figures (8)

Figure 1: Register automaton accepting language of stack with capacity two.
Figure 2: Decision tree for ${\cal L}[u,\hbox{\boldmath$v$}](x_1,p_1,p_2)$.
Figure 3: Three hypotheses constructed by ${SL}^{\lambda}$: $\mathcal{H}_0$ (left), $\mathcal{H}_1$ and $\mathcal{H}_2$ (right).
Figure 4: Classification tree for hypothesis $\mathcal{H}_1$ in \ref{['fig:demo:stack:01']}. Short prefixes are underlined.
Figure 5: Number of resets (two leftmost graphs), counterexamples ($3^{rd}$ graph), and wall clock times ($4^{th}$ graph) for inferring models of the Mbed TLS 2.26.0 server.
...and 3 more figures

Theorems & Definitions (5)

definition thmcounterdefinition
lemma thmcounterlemma
theorem thmcountertheorem
proof
proof

Scalable Tree-based Register Automata Learning

TL;DR

Abstract

Scalable Tree-based Register Automata Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (5)