Model-Based Runtime Monitoring with Interactive Imitation Learning

Huihan Liu; Shivin Dass; Roberto Martín-Martín; Yuke Zhu

Model-Based Runtime Monitoring with Interactive Imitation Learning

Huihan Liu, Shivin Dass, Roberto Martín-Martín, Yuke Zhu

TL;DR

The paper tackles the challenge of generalization and reliability in robot learning by introducing a model-based runtime monitoring mechanism integrated into interactive imitation learning. It learns a latent-space dynamics model via a conditional variational autoencoder and a failure classifier to anticipate future errors, enabling preemptive human interventions without requiring explicit failure data. The approach unifies out-of-distribution detection and failure prediction within a single framework, and continually updates from deployment experiences to reduce human workload while ensuring reliable task execution. Empirical results across simulation and real hardware demonstrate improved system-level performance and unit-test error-predictor accuracy, highlighting practical benefits for safe, long-term robotic deployments.

Abstract

Robot learning methods have recently made great strides, but generalization and robustness challenges still hinder their widespread deployment. Failing to detect and address potential failures renders state-of-the-art learning systems not combat-ready for high-stakes tasks. Recent advances in interactive imitation learning have presented a promising framework for human-robot teaming, enabling the robots to operate safely and continually improve their performances over long-term deployments. Nonetheless, existing methods typically require constant human supervision and preemptive feedback, limiting their practicality in realistic domains. This work aims to endow a robot with the ability to monitor and detect errors during task execution. We introduce a model-based runtime monitoring algorithm that learns from deployment data to detect system anomalies and anticipate failures. Unlike prior work that cannot foresee future failures or requires failure experiences for training, our method learns a latent-space dynamics model and a failure classifier, enabling our method to simulate future action outcomes and detect out-of-distribution and high-risk states preemptively. We train our method within an interactive imitation learning framework, where it continually updates the model from the experiences of the human-robot team collected using trustworthy deployments. Consequently, our method reduces the human workload needed over time while ensuring reliable task execution. Our method outperforms the baselines across system-level and unit-test metrics, with 23% and 40% higher success rates in simulation and on physical hardware, respectively. More information at https://ut-austin-rpl.github.io/sirius-runtime-monitor/

Model-Based Runtime Monitoring with Interactive Imitation Learning

TL;DR

Abstract

Paper Structure (15 sections, 1 equation, 5 figures, 3 tables)

This paper contains 15 sections, 1 equation, 5 figures, 3 tables.

Introduction
Related Work
Model-based Runtime Monitoring
Problem Formulation
Model-based Method Design
Runtime Monitor in Operation: Modules
Runtime Monitoring in Operation: System
Experiments
System-Level Performance
Evaluation Protocol
Baselines
Results
Unit Testing Error Predictors
Ablation Study
Conclusion and Limitations

Figures (5)

Figure 1: Overview. We introduce a model-based runtime monitoring algorithm that continuously learns to predict errors from deployment data. We integrate this runtime monitoring algorithm into an interactive imitation learning framework to ensure trustworthy long-term deployment.
Figure 2: Model Architecture. We train a dynamics model, implemented as a conditional Variational Autoencoder (cVAE), to predict the next latent state given the current state and action. We also train a policy and a failure classifier head based on the latent state. The dynamics model and policy are trained from the experiences collected from task execution. The failure classifier uses the human intervention states to infer failure states.
Figure 3: Normalized ROHE curves over three rounds of iterative deployment for tasks Square Nut Assembly (left), Threading (left-middle), Coffee Pod Packing (right-middle), and Gear Assembly (right). Our method generally has lower ROHE in the first round due to the higher human engagement initially; the ROHE becomes better in later rounds as our method becomes more effective at identifying important errors during deployment.
Figure 4: Ablation Study. Ours (Full) achieves a higher overall performance than Ours Failure and Ours OOD. This result indicates that both failure detection and OOD detection modules complement each other and contribute to the overall success of our system.
Figure 5: Qualitative results from ablation study. The OOD detection is able to identify unfamiliar states at a coarser level while the failure detection identifies finer-grained failures.

Model-Based Runtime Monitoring with Interactive Imitation Learning

TL;DR

Abstract

Model-Based Runtime Monitoring with Interactive Imitation Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)