LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning

Lekai Chen; Ashutosh Trivedi; Alvaro Velasquez

LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning

Lekai Chen, Ashutosh Trivedi, Alvaro Velasquez

TL;DR

This work tackles learning deterministic finite automata (DFA) in the presence of imperfect large language model (LLM) oracles by introducing the probabilistic Minimally Adequate Teacher (pMAT). It combines two prompting strategies, Discrimination and Verification, with a hybrid LearnAnyWay with Passive Refinement (LAPR) algorithm that uses a query cache to correct persistent membership-query errors and leverage counterexamples. Empirical results show significant reductions in query-level error rates and robust DFA recovery under noisy MQ conditions, with LAPR outperforming traditional DFA learners and prior LearnAnyWay variants. The approach provides a practical, theoretically grounded framework for integrating LLMs into automata learning and runtime verification tasks, enabling reliable formal reasoning in the presence of persistent oracle errors.

Abstract

The emergence of intelligence in large language models (LLMs) has inspired investigations into their integration into automata learning. This paper introduces the probabilistic Minimally Adequate Teacher (pMAT) formulation, which leverages a probabilistic oracle that could give persistent errors randomly during answering the membership queries for deterministic finite automata (DFA) learning. Given the tendency of LLMs to produce hallucinatory content, we have developed techniques to improve answer accuracy and ensure the correctness of the learned automata. We propose the $\mathtt{Discrimination}$ prompt as well as the $\mathtt{Verification}$ prompt and explore their advantages over common prompts. Additionally, we compare DFA learning performance between the TTT algorithm and common active learning algorithms. To address the exponential number of persistent errors, we implement a dynamic query cache refinement algorithm that identifies and corrects conflicting queries by combining the active and passive learning algorithms. The empirical results demonstrate the robustness and efficiency of our approach, providing a theoretical foundation for automata learning with LLMs in the loop.

LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning

TL;DR

Abstract

prompt as well as the

prompt and explore their advantages over common prompts. Additionally, we compare DFA learning performance between the TTT algorithm and common active learning algorithms. To address the exponential number of persistent errors, we implement a dynamic query cache refinement algorithm that identifies and corrects conflicting queries by combining the active and passive learning algorithms. The empirical results demonstrate the robustness and efficiency of our approach, providing a theoretical foundation for automata learning with LLMs in the loop.

Paper Structure (22 sections, 5 figures, 2 tables, 2 algorithms)

This paper contains 22 sections, 5 figures, 2 tables, 2 algorithms.

Introduction
Related Works
Active Learning Algorithms and the MAT Framework.
Passive Learning Algorithms.
Using LLMs as Oracles.
Preliminaries
Deterministic Finite Automata (DFA).
Minimally Adequate Teacher (MAT).
The pMAT Formulation.
Methods
Verification Prompt.
Discrimination Prompt.
LearnAnyWay with Passive Refinement
Algorithm.
Empirical Results
...and 7 more sections

Figures (5)

Figure 1: Probabilistic minimally adequate teacher formulation with LLMs or human in the loop.
Figure 2: Discrimination prompt running example: (a) target automaton that only accepts the string starting with $a$. (b) The corresponding discrimination tree to the target DFA. The leaves are states in the automaton. The inner nodes represent the discriminator that makes the states in left and right side different. (c) The edit distance between the new query and the cached queries.
Figure 3: Relationship between errors and CE length. The average length of CEs refers to the length parameter used during generating CEs. They must be longer than this length parameter unless no longer counterexample exists. If the average CE length is set to 0, the oracle returns the shortest counterexample. This test is running on a simple automaton that recognizes strings with more than 3 'a's and more than 2 'b's, employing the LearnAnyWay$\&$ TTT as the DFA learner.
Figure 4: LearnAnyWay , RPNI-EDSM, LAPR Performance Comparison. Both RPNI-EDSM and LAPR can learn the correct DFAs within $5e4$ membership queries.
Figure 5: Verfication prompt

LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning

TL;DR

Abstract

LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)