Table of Contents
Fetching ...

RAIL: Reachability-Aided Imitation Learning for Safe Policy Execution

Wonsuhk Jung, Dennis Anthony, Utkarsh A. Mishra, Nadun Ranawaka Arachchige, Matthew Bronars, Danfei Xu, Shreyas Kousik

TL;DR

RAIL addresses the problem of enforcing hard safety constraints in imitation learning for robotics by overlaying a reachability-based safety filter on top of state-of-the-art IL policies. It couples offline IL with a receding-horizon safety wrapper consisting of a head-plan verifier and a low-dimensional backup planner, and employs continuous-time collision checking to guarantee safety. The core technical contribution is overapproximating the robot's continuous swept volume via polynomial zonotopes to enable fast, provable collision checks during planning. Empirical results across simulation and real hardware show that RAIL achieves 0% collision rates while maintaining or sometimes improving task success, demonstrating that hard safety constraints can enhance performance and are feasible in real time (about $0.42 \pm 0.05$ s per plan). The framework is compatible with diffusion-based IL and other offline IL methods, and opens avenues for integrating safety directly into IL training to reduce planning conflicts.

Abstract

Imitation learning (IL) has shown great success in learning complex robot manipulation tasks. However, there remains a need for practical safety methods to justify widespread deployment. In particular, it is important to certify that a system obeys hard constraints on unsafe behavior in settings when it is unacceptable to design a tradeoff between performance and safety via tuning the policy (i.e. soft constraints). This leads to the question, how does enforcing hard constraints impact the performance (meaning safely completing tasks) of an IL policy? To answer this question, this paper builds a reachability-based safety filter to enforce hard constraints on IL, which we call Reachability-Aided Imitation Learning (RAIL). Through evaluations with state-of-the-art IL policies in mobile robots and manipulation tasks, we make two key findings. First, the highest-performing policies are sometimes only so because they frequently violate constraints, and significantly lose performance under hard constraints. Second, surprisingly, hard constraints on the lower-performing policies can occasionally increase their ability to perform tasks safely. Finally, hardware evaluation confirms the method can operate in real time.

RAIL: Reachability-Aided Imitation Learning for Safe Policy Execution

TL;DR

RAIL addresses the problem of enforcing hard safety constraints in imitation learning for robotics by overlaying a reachability-based safety filter on top of state-of-the-art IL policies. It couples offline IL with a receding-horizon safety wrapper consisting of a head-plan verifier and a low-dimensional backup planner, and employs continuous-time collision checking to guarantee safety. The core technical contribution is overapproximating the robot's continuous swept volume via polynomial zonotopes to enable fast, provable collision checks during planning. Empirical results across simulation and real hardware show that RAIL achieves 0% collision rates while maintaining or sometimes improving task success, demonstrating that hard safety constraints can enhance performance and are feasible in real time (about s per plan). The framework is compatible with diffusion-based IL and other offline IL methods, and opens avenues for integrating safety directly into IL training to reduce planning conflicts.

Abstract

Imitation learning (IL) has shown great success in learning complex robot manipulation tasks. However, there remains a need for practical safety methods to justify widespread deployment. In particular, it is important to certify that a system obeys hard constraints on unsafe behavior in settings when it is unacceptable to design a tradeoff between performance and safety via tuning the policy (i.e. soft constraints). This leads to the question, how does enforcing hard constraints impact the performance (meaning safely completing tasks) of an IL policy? To answer this question, this paper builds a reachability-based safety filter to enforce hard constraints on IL, which we call Reachability-Aided Imitation Learning (RAIL). Through evaluations with state-of-the-art IL policies in mobile robots and manipulation tasks, we make two key findings. First, the highest-performing policies are sometimes only so because they frequently violate constraints, and significantly lose performance under hard constraints. Second, surprisingly, hard constraints on the lower-performing policies can occasionally increase their ability to perform tasks safely. Finally, hardware evaluation confirms the method can operate in real time.
Paper Structure (12 sections, 4 theorems, 7 equations, 5 figures, 4 tables, 2 algorithms)

This paper contains 12 sections, 4 theorems, 7 equations, 5 figures, 4 tables, 2 algorithms.

Key Result

Lemma 3

Suppose a 1-D revolute joint travels counterclockwise (CCW) from an angle $\theta_1$ to $\theta_2 > \theta_1$. Map this motion to the unit circle $\mathrm{\textnormal{SO}}(1)$ as $p_1 = (\cos(\theta_1),\sin(\theta_1))$ and $p_2 = (\cos(\theta_2),\sin(\theta_2))$. Define $p_3 = \tfrac{1}{2}(p_1 + p_2

Figures (5)

  • Figure 1: Our RAIL framework applied to a real-world robotic manipulation task. Top: A Franka robot arm safely executes a pick-and-place task among delicate obstacles. Bottom: System diagram illustrating how RAIL integrates an imitation learning policy with a safety filter, using plan validation and a failsafe planner to enforce hard constraints.
  • Figure 2: We overapproximate the sines and cosines of the robot's joint angles along a trajectory to enable overapproximating the robot's swept volume.
  • Figure 3: Evaluation tasks (left-to-right): (a) Maze-Medium: A point mass has to be taken to a goal in a maze without hitting the walls. (b) Maze-Large: Similar to Maze-Medium, but larger and more difficult to traverse. (c) Can Pick-Place: A can must be picked from a table and placed in a target bin.
  • Figure 4: (Leftmost) Illustration of Can Pick-Place task: A can must be picked from a table and placed in a target bin. (Rightward to Rightmost) Timelapse of an episode involving navigating through a tight gap. A diffusion policy repeatedly proposed unsafe plans, until eventually RAIL validated a safe path to picking. The IL policy again proposed unsafe plans to move the picked object, until RAIL validated a safe path outwards to placing.
  • Figure 5: (Left) Visual representation of RAIL in the Maze-Medium task. The diffusion policy's proposed plan (gradient line) is validated by checking the reachable set of head plans (magenta) and the existence of a backup plan (green dotted line), with its reachable set (green tube). (Right) Heatmap comparing the number of safety interventions by RAIL in the Maze-Large task over 100 episodes with random initializations, comparing diffusion policies at Epoch 1900 (Top) and Epoch 50 (Bottom).

Theorems & Definitions (9)

  • Remark 2
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Theorem 5
  • proof
  • Corollary 6
  • proof