RAIL: Reachability-Aided Imitation Learning for Safe Policy Execution

Wonsuhk Jung; Dennis Anthony; Utkarsh A. Mishra; Nadun Ranawaka Arachchige; Matthew Bronars; Danfei Xu; Shreyas Kousik

RAIL: Reachability-Aided Imitation Learning for Safe Policy Execution

Wonsuhk Jung, Dennis Anthony, Utkarsh A. Mishra, Nadun Ranawaka Arachchige, Matthew Bronars, Danfei Xu, Shreyas Kousik

TL;DR

RAIL addresses the problem of enforcing hard safety constraints in imitation learning for robotics by overlaying a reachability-based safety filter on top of state-of-the-art IL policies. It couples offline IL with a receding-horizon safety wrapper consisting of a head-plan verifier and a low-dimensional backup planner, and employs continuous-time collision checking to guarantee safety. The core technical contribution is overapproximating the robot's continuous swept volume via polynomial zonotopes to enable fast, provable collision checks during planning. Empirical results across simulation and real hardware show that RAIL achieves 0% collision rates while maintaining or sometimes improving task success, demonstrating that hard safety constraints can enhance performance and are feasible in real time (about $0.42 \pm 0.05$ s per plan). The framework is compatible with diffusion-based IL and other offline IL methods, and opens avenues for integrating safety directly into IL training to reduce planning conflicts.

Abstract

Imitation learning (IL) has shown great success in learning complex robot manipulation tasks. However, there remains a need for practical safety methods to justify widespread deployment. In particular, it is important to certify that a system obeys hard constraints on unsafe behavior in settings when it is unacceptable to design a tradeoff between performance and safety via tuning the policy (i.e. soft constraints). This leads to the question, how does enforcing hard constraints impact the performance (meaning safely completing tasks) of an IL policy? To answer this question, this paper builds a reachability-based safety filter to enforce hard constraints on IL, which we call Reachability-Aided Imitation Learning (RAIL). Through evaluations with state-of-the-art IL policies in mobile robots and manipulation tasks, we make two key findings. First, the highest-performing policies are sometimes only so because they frequently violate constraints, and significantly lose performance under hard constraints. Second, surprisingly, hard constraints on the lower-performing policies can occasionally increase their ability to perform tasks safely. Finally, hardware evaluation confirms the method can operate in real time.

RAIL: Reachability-Aided Imitation Learning for Safe Policy Execution

TL;DR

s per plan). The framework is compatible with diffusion-based IL and other offline IL methods, and opens avenues for integrating safety directly into IL training to reduce planning conflicts.

Abstract

Paper Structure (12 sections, 4 theorems, 7 equations, 5 figures, 4 tables, 2 algorithms)

This paper contains 12 sections, 4 theorems, 7 equations, 5 figures, 4 tables, 2 algorithms.

Introduction
Related Work
Problem Statement and Preliminaries
Framework Overview
Motion Policy
Reachability-Assisted Imitation Learning
Continuous-Time Collision Checking
Results
Simulation: Safe Planning in Maze
Simulation: Safe Planning in Pick-and-place
Real World Evaluation
Conclusion

Key Result

Lemma 3

Suppose a 1-D revolute joint travels counterclockwise (CCW) from an angle $\theta_1$ to $\theta_2 > \theta_1$. Map this motion to the unit circle $\mathrm{\textnormal{SO}}(1)$ as $p_1 = (\cos(\theta_1),\sin(\theta_1))$ and $p_2 = (\cos(\theta_2),\sin(\theta_2))$. Define $p_3 = \tfrac{1}{2}(p_1 + p_2

Figures (5)

Figure 1: Our RAIL framework applied to a real-world robotic manipulation task. Top: A Franka robot arm safely executes a pick-and-place task among delicate obstacles. Bottom: System diagram illustrating how RAIL integrates an imitation learning policy with a safety filter, using plan validation and a failsafe planner to enforce hard constraints.
Figure 2: We overapproximate the sines and cosines of the robot's joint angles along a trajectory to enable overapproximating the robot's swept volume.
Figure 3: Evaluation tasks (left-to-right): (a) Maze-Medium: A point mass has to be taken to a goal in a maze without hitting the walls. (b) Maze-Large: Similar to Maze-Medium, but larger and more difficult to traverse. (c) Can Pick-Place: A can must be picked from a table and placed in a target bin.
Figure 4: (Leftmost) Illustration of Can Pick-Place task: A can must be picked from a table and placed in a target bin. (Rightward to Rightmost) Timelapse of an episode involving navigating through a tight gap. A diffusion policy repeatedly proposed unsafe plans, until eventually RAIL validated a safe path to picking. The IL policy again proposed unsafe plans to move the picked object, until RAIL validated a safe path outwards to placing.
Figure 5: (Left) Visual representation of RAIL in the Maze-Medium task. The diffusion policy's proposed plan (gradient line) is validated by checking the reachable set of head plans (magenta) and the existence of a backup plan (green dotted line), with its reachable set (green tube). (Right) Heatmap comparing the number of safety interventions by RAIL in the Maze-Large task over 100 episodes with random initializations, comparing diffusion policies at Epoch 1900 (Top) and Epoch 50 (Bottom).

Theorems & Definitions (9)

Remark 2
Lemma 3
proof
Lemma 4
proof
Theorem 5
proof
Corollary 6
proof

RAIL: Reachability-Aided Imitation Learning for Safe Policy Execution

TL;DR

Abstract

RAIL: Reachability-Aided Imitation Learning for Safe Policy Execution

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (9)