Scalable Tree-based Register Automata Learning
Simon Dierl, Paul Fiterau-Brostean, Falk Howar, Bengt Jonsson, Konstantinos Sagonas, Fredrik Tåquist
TL;DR
This paper tackles the scalability challenge of learning register automata (RA) by introducing $SL^{\\lambda}$, a tree-based learning algorithm that uses a classification tree and short, restricted symbolic suffixes to minimize test and tree-query costs. It formalizes data languages and RA, and develops a learning loop that enforces location, transition, and register consistency while constructing a canonical RA from CT leaves. The authors prove complexity bounds showing improved equivalence-query and membership-query counts over prior approaches, and demonstrate empirical gains on real-world (DTLS) and synthetic RA models. The work advances active automata learning for data-rich systems, enabling scalable modeling of protocols and data-dependent behaviors.
Abstract
Existing active automata learning (AAL) algorithms have demonstrated their potential in capturing the behavior of complex systems (e.g., in analyzing network protocol implementations). The most widely used AAL algorithms generate finite state machine models, such as Mealy machines. For many analysis tasks, however, it is crucial to generate richer classes of models that also show how relations between data parameters affect system behavior. Such models have shown potential to uncover critical bugs, but their learning algorithms do not scale beyond small and well curated experiments. In this paper, we present $SL^λ$, an effective and scalable register automata (RA) learning algorithm that significantly reduces the number of tests required for inferring models. It achieves this by combining a tree-based cost-efficient data structure with mechanisms for computing short and restricted tests. We have implemented $SL^λ$ as a new algorithm in RALib. We evaluate its performance by comparing it against $SL^*$, the current state-of-the-art RA learning algorithm, in a series of experiments, and show superior performance and substantial asymptotic improvements in bigger systems.
