Table of Contents
Fetching ...

Online Test Synthesis From Requirements: Enhancing Reinforcement Learning with Game Theory

Ocan Sankur, Thierry Jéron, Nicolas Markey, David Mentré, Reiya Noguchi

TL;DR

The study addresses automatic online synthesis of test cases for black-box reactive systems using automata-based requirements. It introduces a Monte Carlo Tree Search framework augmented with game-theoretic greedy strategies and reward shaping to accelerate convergence toward test objectives while preserving completeness guarantees. The key contribution is a Greedy-MCTS algorithm that biases both tree and roll-out policies using controllable predecessor concepts and cooperative states, demonstrated to dramatically improve success rates in a case study. This approach offers a scalable, principled method for efficient online conformance testing with practical implications for industrial testing of complex reactive systems.

Abstract

We consider the automatic online synthesis of black-box test cases from functional requirements specified as automata for reactive implementations. The goal of the tester is to reach some given state, so as to satisfy a coverage criterion, while monitoring the violation of the requirements. We develop an approach based on Monte Carlo Tree Search, which is a classical technique in reinforcement learning for efficiently selecting promising inputs. Seeing the automata requirements as a game between the implementation and the tester, we develop a heuristic by biasing the search towards inputs that are promising in this game. We experimentally show that our heuristic accelerates the convergence of the Monte Carlo Tree Search algorithm, thus improving the performance of testing.

Online Test Synthesis From Requirements: Enhancing Reinforcement Learning with Game Theory

TL;DR

The study addresses automatic online synthesis of test cases for black-box reactive systems using automata-based requirements. It introduces a Monte Carlo Tree Search framework augmented with game-theoretic greedy strategies and reward shaping to accelerate convergence toward test objectives while preserving completeness guarantees. The key contribution is a Greedy-MCTS algorithm that biases both tree and roll-out policies using controllable predecessor concepts and cooperative states, demonstrated to dramatically improve success rates in a case study. This approach offers a scalable, principled method for efficient online conformance testing with practical implications for industrial testing of complex reactive systems.

Abstract

We consider the automatic online synthesis of black-box test cases from functional requirements specified as automata for reactive implementations. The goal of the tester is to reach some given state, so as to satisfy a coverage criterion, while monitoring the violation of the requirements. We develop an approach based on Monte Carlo Tree Search, which is a classical technique in reinforcement learning for efficiently selecting promising inputs. Seeing the automata requirements as a game between the implementation and the tester, we develop a heuristic by biasing the search towards inputs that are promising in this game. We experimentally show that our heuristic accelerates the convergence of the Monte Carlo Tree Search algorithm, thus improving the performance of testing.
Paper Structure (29 sections, 3 theorems, 7 equations, 8 figures, 1 table)

This paper contains 29 sections, 3 theorems, 7 equations, 8 figures, 1 table.

Key Result

Lemma 4.0.1

For each requirement set $\calR$, implementation $I$, and test objective $O$, there exists $K>0$ such that if $O$ is reachable in ${\calA_\calR} \otimes I$, then $\textsf{uniform}_K$ finds a covering trace with probability $1$.

Figures (8)

  • Figure 1: Example of a (deterministic) automaton expressing requirements.
  • Figure 2: The carriage control system example from PLC-mitsu.
  • Figure 3: An automaton modeling requirements R1, R2, R3. For readability, we omitted some transitions in the figure: from all output states, there is an additional transition guarded by $\movebwd \land \movefwd$ to err. Moreover, from each of the two rightmost output states, the negation of the guard of the only leaving transition goes to err as well. For instance, from $s_1$, reading $\lnot \fwdlimit$ and then $\lnot \movefwd$, we end in err. Note that reading $\lnot \bwdlimit$ in $s_0$ is also an error since this is not supposed to happen in this system.
  • Figure 4: An implementation $I_1$ and its product with the automaton of Fig. \ref{['fig-exaut']}
  • Figure 5: Construction of the $(W_i)_i$ hierarchy and cooperative strategies
  • ...and 3 more figures

Theorems & Definitions (16)

  • Definition 2.0.1
  • Definition 2.0.2
  • Example 2.0.1
  • Example 2.0.2: Factory Carriage Example
  • Definition 3.0.1
  • Definition 3.0.2
  • Definition 3.0.3
  • Definition 3.0.4
  • Example 3.0.1
  • Example 4.0.1
  • ...and 6 more