Real-Time Anomaly Detection and Reactive Planning with Large Language Models
Rohan Sinha, Amine Elhafsi, Christopher Agia, Matthew Foutter, Edward Schmerling, Marco Pavone
TL;DR
This work tackles real-time anomaly detection and safe reactive planning for autonomous robots under distributional shift by introducing AESOP, a two-stage framework that leverages LLM embeddings for fast anomaly detection and autoregressive LLM reasoning for hazard assessment. The fast stage grounds observations in embedding space learned from nominal experiences to flag deviations quickly, while the slow stage uses generative reasoning to decide on safety interventions, all integrated into a model predictive control scheme that maintains multiple feasible recovery trajectories to accommodate LLM latency. The authors demonstrate how embedding-based detectors with small models can outperform high-capacity generative baselines in many scenarios, and show that accounting for the latency of slow reasoning preserves safety and task progress in simulated and real robotic platforms (quadrotor) as well as in CARLA-based autonomous-vehicle experiments. The results indicate that on-device embedding-based monitoring, combined with latency-aware planning, can meaningfully improve the reliability and safety of agile robots, with promising directions for reducing latency, refining recovery-region selection, and enabling continual learning.
Abstract
Foundation models, e.g., large language models (LLMs), trained on internet-scale data possess zero-shot generalization capabilities that make them a promising technology towards detecting and mitigating out-of-distribution failure modes of robotic systems. Fully realizing this promise, however, poses two challenges: (i) mitigating the considerable computational expense of these models such that they may be applied online, and (ii) incorporating their judgement regarding potential anomalies into a safe control framework. In this work, we present a two-stage reasoning framework: First is a fast binary anomaly classifier that analyzes observations in an LLM embedding space, which may then trigger a slower fallback selection stage that utilizes the reasoning capabilities of generative LLMs. These stages correspond to branch points in a model predictive control strategy that maintains the joint feasibility of continuing along various fallback plans to account for the slow reasoner's latency as soon as an anomaly is detected, thus ensuring safety. We show that our fast anomaly classifier outperforms autoregressive reasoning with state-of-the-art GPT models, even when instantiated with relatively small language models. This enables our runtime monitor to improve the trustworthiness of dynamic robotic systems, such as quadrotors or autonomous vehicles, under resource and time constraints. Videos illustrating our approach in both simulation and real-world experiments are available on this project page: https://sites.google.com/view/aesop-llm.
