Formal Methods in Robot Policy Learning and Verification: A Survey on Current Techniques and Future Directions

Anastasios Manganaris; Vittorio Giammarino; Ahmed H. Qureshi; Suresh Jagannathan

Formal Methods in Robot Policy Learning and Verification: A Survey on Current Techniques and Future Directions

Anastasios Manganaris, Vittorio Giammarino, Ahmed H. Qureshi, Suresh Jagannathan

TL;DR

This survey analyzes the burgeoning role of formal methods (FM) in learning-based robotics, addressing the challenge that deep neural policies are powerful yet opaque and potentially unsafe. It organizes FM applications into policy learning (online and offline) and policy verification (discrete abstractions, reachability, and certificate-based methods), comparing their expressiveness, scalability, and assumptions. Key contributions include a unified taxonomy, critical risk–benefit assessments of automata-based and non-automata approaches, and a roadmap of future directions such as more expressive specifications, probabilistic and partial-observability modeling, and scalable verification for large policy models. The work underscores FM’s potential to deliver interpretable, verifiable, and safer robot policies, guiding future research toward real-world deployment of dependable learned robotics systems.

Abstract

As hardware and software systems have grown in complexity, formal methods have been indispensable tools for rigorously specifying acceptable behaviors, synthesizing programs to meet these specifications, and validating the correctness of existing programs. In the field of robotics, a similar trend of rising complexity has emerged, driven in large part by the adoption of deep learning. While this shift has enabled the development of highly performant robot policies, their implementation as deep neural networks has posed challenges to traditional formal analysis, leading to models that are inflexible, fragile, and difficult to interpret. In response, the robotics community has introduced new formal and semi-formal methods to support the precise specification of complex objectives, guide the learning process to achieve them, and enable the verification of learned policies against them. In this survey, we provide a comprehensive overview of how formal methods have been used in recent robot learning research. We organize our discussion around two pillars: policy learning and policy verification. For both, we highlight representative techniques, compare their scalability and expressiveness, and summarize how they contribute to meaningfully improving realistic robot safety and correctness. We conclude with a discussion of remaining obstacles for achieving that goal and promising directions for advancing formal methods in robot learning.

Formal Methods in Robot Policy Learning and Verification: A Survey on Current Techniques and Future Directions

TL;DR

Abstract

Paper Structure (44 sections, 2 equations, 6 figures)

This paper contains 44 sections, 2 equations, 6 figures.

Introduction
Scope and Contribution
Outline
Preliminaries for Formal Methods and Specifications
Temporal Logic
Automata
Formal Synthesis and Verification
Formal Methods in Robot Policy Learning
Online Policy Learning
Automata-Theoretic Reinforcement Learning
Automata-Guided Reinforcement Learning
Automata-Free Reinforcement Learning
Programmatic Reinforcement Learning
Model-Based Reinforcement Learning
Offline Policy Learning
...and 29 more sections

Figures (6)

Figure 1: Our survey first provides background information on FM and their requisite specifications, denoted by $\phi$. We then survey recent work on FM in robotics in two categories: work on FM-informed learning of robot policies ($\pi$), and work on verifying that learned robot policies satisfy specifications ($\pi \models \phi$).
Figure 2: A typical discrete domain modeling a single agent in a building with several rooms. The model is defined with atomic propositions corresponding to the agent's presence in each room and the occupancy status of the conference room ($AP = \{R, O_1, O_2, C, B, C_{\text{busy}} \}$ using the symbols in \ref{['subfig:environment']}). The example behavior in blue depicts the agent exiting the starting room, cycling infinitely between visiting the offices, and at some point visiting the break-room. An example LTL expression (\ref{['subfig:ltl']}) can be used to specify complex behaviors and is evaluated over sequences of labels (\ref{['subfig:labeling']}) obtained via the labeling function $L : S \rightarrow 2^{AP}$.
Figure 3: A DBA for $\phi = \square \lozenge O_1 \wedge \square \lozenge O_2$. Intuitively, as the agent periodically encounters states $s$ satisfying $O_1 \in L(s)$ and $O_2 \in L(s)$, the automaton moves from $q_0$, to $q_1$, to $q_2$, and back to $q_0$. As a result, the accepting state $q_0 \in F$ is visited infinitely often and the sequence is deemed to satisfy $\phi$.
Figure 4: The research in FM-informed policy learning can be organized into categories corresponding to five major robot policy learning frameworks. The first two, model-free and model-based RL methods, are online methods that rely on environment interaction during learning. Within model-free RL, there are algorithms that learn from automata-theoretic rewards, algorithms that use automata for guidance without explicit rewards, algorithms that do not rely on automata at all, and algorithms based on programmatic policies. Model-based RL primarily extends optimal control techniques, such as MPC, with DL components to enhance performance or versatility. The remaining three categories, IRL, Behavior Cloning, and offline RL, are primarily offline methods that learn from pre-collected demonstration datasets. Their constituent subfields include learning FS as reward functions for IRL, augmenting behavior cloning losses with FS, and training sequence models to generate actions conditioned on high FS satisfaction.
Figure 5: Our survey of robot policy verification methods is organized by three major categories of temporal FS, visualized here using the temporal property hierarchy of manna1990hierarchy. The first category (top), primarily represented by methods using discrete system abstractions, is defined by support for general $\omega$-regular FS. The second category (middle) supports the more limited "obligation" class of FS and is primarily represented by reachability analysis methods. The final category (bottom) supports particular forms of safety (invariance) and guarantee (eventual stability) FS and is represented by certificate function methods. The typical capabilities of these methods are included and approximately rated as either poor in red, moderate in orange, or good in green.
...and 1 more figures

Formal Methods in Robot Policy Learning and Verification: A Survey on Current Techniques and Future Directions

TL;DR

Abstract

Formal Methods in Robot Policy Learning and Verification: A Survey on Current Techniques and Future Directions

Authors

TL;DR

Abstract

Table of Contents

Figures (6)