Table of Contents
Fetching ...

Database-assisted automata learning

Hielke Walinga, Robert Baumgartner, Sicco Verwer

TL;DR

DAALder addresses the challenge of learning DFAs from large trace datasets by leveraging a database-backed workflow that avoids loading all data into memory. It integrates active and passive learning within a red-blue state merging framework, using a PTA-based observation tree and PrefixQuery to iteratively refine merges via informative traces. Empirical results show DAALder matches the accuracy of conventional state merging on large datasets while using substantially less memory, with a solvability tipping point near $40{,}000$ traces that marks rapid convergence. The work highlights the practicality of database-assisted automata learning and suggests directions for richer database queries and adaptive exploration–exploitation strategies for real-world tracing data.

Abstract

This paper presents DAALder (Database-Assisted Automata Learning, with Dutch suffix from leerder), a new algorithm for learning state machines, or automata, specifically deterministic finite-state automata (DFA). When learning state machines from log data originating from software systems, the large amount of log data can pose a challenge. Conventional state merging algorithms cannot efficiently deal with this, as they require a large amount of memory. To solve this, we utilized database technologies to efficiently query a big trace dataset and construct a state machine from it, as databases allow to save large amounts of data on disk while still being able to query it efficiently. Building on research in both active learning and passive learning, the proposed algorithm is a combination of the two. It can quickly find a characteristic set of traces from a database using heuristics from a state merging algorithm. Experiments show that our algorithm has similar performance to conventional state merging algorithms on large datasets, but requires far less memory.

Database-assisted automata learning

TL;DR

DAALder addresses the challenge of learning DFAs from large trace datasets by leveraging a database-backed workflow that avoids loading all data into memory. It integrates active and passive learning within a red-blue state merging framework, using a PTA-based observation tree and PrefixQuery to iteratively refine merges via informative traces. Empirical results show DAALder matches the accuracy of conventional state merging on large datasets while using substantially less memory, with a solvability tipping point near traces that marks rapid convergence. The work highlights the practicality of database-assisted automata learning and suggests directions for richer database queries and adaptive exploration–exploitation strategies for real-world tracing data.

Abstract

This paper presents DAALder (Database-Assisted Automata Learning, with Dutch suffix from leerder), a new algorithm for learning state machines, or automata, specifically deterministic finite-state automata (DFA). When learning state machines from log data originating from software systems, the large amount of log data can pose a challenge. Conventional state merging algorithms cannot efficiently deal with this, as they require a large amount of memory. To solve this, we utilized database technologies to efficiently query a big trace dataset and construct a state machine from it, as databases allow to save large amounts of data on disk while still being able to query it efficiently. Building on research in both active learning and passive learning, the proposed algorithm is a combination of the two. It can quickly find a characteristic set of traces from a database using heuristics from a state merging algorithm. Experiments show that our algorithm has similar performance to conventional state merging algorithms on large datasets, but requires far less memory.
Paper Structure (21 sections, 3 figures)

This paper contains 21 sections, 3 figures.

Figures (3)

  • Figure 1: Graphical representation of DAALder
  • Figure 2: Time and memory usage for EDSM vs DAALder
  • Figure 3: Amount of traces included in the final model for DAALder