Table of Contents
Fetching ...

Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains

Jasmine Jerry Aloor, Jay Patrikar, Parv Kapoor, Jean Oh, Sebastian Scherer

TL;DR

This paper addresses safe, rule-compliant planning for learning-based agents by online GUIDANCE of offline LfD policies with STL-based robustness within an MCTS framework. The method augments the MCTS heuristic with STL robustness to bias exploration toward trajectories that satisfy spatio-temporal constraints, demonstrated on general aviation planning around a non-towered airfield. Using a GoalGAIL-based offline policy and TrajAir data, the approach achieves improved constraint satisfaction and higher STL robustness compared with baselines, including in challenging landing scenarios. The work enables rule-aware decision-making in continuous spaces without changing the underlying offline policy, offering practical benefits for real-world deployment in safety-critical domains.

Abstract

Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies is a critical requirement to enable the real-world deployment of AI agents. Recently, Signal Temporal Logic (STL) has been shown to be an effective language for encoding rules as spatio-temporal constraints. This work uses Monte Carlo Tree Search (MCTS) as a means of integrating STL specification into a vanilla LfD policy to improve constraint satisfaction. We propose augmenting the MCTS heuristic with STL robustness values to bias the tree search towards branches with higher constraint satisfaction. While the domain-independent method can be applied to integrate STL rules online into any pre-trained LfD algorithm, we choose goal-conditioned Generative Adversarial Imitation Learning as the offline LfD policy. We apply the proposed method to the domain of planning trajectories for General Aviation aircraft around a non-towered airfield. Results using the simulator trained on real-world data showcase 60% improved performance over baseline LfD methods that do not use STL heuristics.

Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains

TL;DR

This paper addresses safe, rule-compliant planning for learning-based agents by online GUIDANCE of offline LfD policies with STL-based robustness within an MCTS framework. The method augments the MCTS heuristic with STL robustness to bias exploration toward trajectories that satisfy spatio-temporal constraints, demonstrated on general aviation planning around a non-towered airfield. Using a GoalGAIL-based offline policy and TrajAir data, the approach achieves improved constraint satisfaction and higher STL robustness compared with baselines, including in challenging landing scenarios. The work enables rule-aware decision-making in continuous spaces without changing the underlying offline policy, offering practical benefits for real-world deployment in safety-critical domains.

Abstract

Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies is a critical requirement to enable the real-world deployment of AI agents. Recently, Signal Temporal Logic (STL) has been shown to be an effective language for encoding rules as spatio-temporal constraints. This work uses Monte Carlo Tree Search (MCTS) as a means of integrating STL specification into a vanilla LfD policy to improve constraint satisfaction. We propose augmenting the MCTS heuristic with STL robustness values to bias the tree search towards branches with higher constraint satisfaction. While the domain-independent method can be applied to integrate STL rules online into any pre-trained LfD algorithm, we choose goal-conditioned Generative Adversarial Imitation Learning as the offline LfD policy. We apply the proposed method to the domain of planning trajectories for General Aviation aircraft around a non-towered airfield. Results using the simulator trained on real-world data showcase 60% improved performance over baseline LfD methods that do not use STL heuristics.
Paper Structure (18 sections, 10 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 18 sections, 10 equations, 4 figures, 1 table, 2 algorithms.

Figures (4)

  • Figure 1: Figure shows the proposed approach in a prototypical scenario to plan for an aircraft landing in a non-towered airfield. The expert/pilot demonstrations (grey) are used offline to train an LfD policy; online, we use the Signal Temporal Logic specifications in the MCTS expansion to ensure rule compliance.
  • Figure 2: Overview of the approach: Offline, we train an LfD policy using datasets, which are used Online in a Monte-Carlo Tree Search (MCTS). The online expansion uses a modified heuristic that uses robustness values from Signal Temporal Logic (STL) specification to guide the search toward higher rule conformance.
  • Figure 3: Goal representation is in the form of a one-hot vector where each region is the respective goal element in the goal vector $G$. The maroon rectangle shows airport traffic patterns.
  • Figure 4: Figure shows a qualitative example from one of the cases where the aircraft starts from the South-West and needs to land at one of the runways (R26). The specifications $\Phi_1$, $\Phi_2$ and $\Phi_3$ are shown as rectangles. White marked lines show the aircraft trajectory, and the magenta shows the MCTS tree. The runway threshold for R08 (+x-axis) is at the center.