Table of Contents
Fetching ...

Real-Time Anomaly Detection and Reactive Planning with Large Language Models

Rohan Sinha, Amine Elhafsi, Christopher Agia, Matthew Foutter, Edward Schmerling, Marco Pavone

TL;DR

This work tackles real-time anomaly detection and safe reactive planning for autonomous robots under distributional shift by introducing AESOP, a two-stage framework that leverages LLM embeddings for fast anomaly detection and autoregressive LLM reasoning for hazard assessment. The fast stage grounds observations in embedding space learned from nominal experiences to flag deviations quickly, while the slow stage uses generative reasoning to decide on safety interventions, all integrated into a model predictive control scheme that maintains multiple feasible recovery trajectories to accommodate LLM latency. The authors demonstrate how embedding-based detectors with small models can outperform high-capacity generative baselines in many scenarios, and show that accounting for the latency of slow reasoning preserves safety and task progress in simulated and real robotic platforms (quadrotor) as well as in CARLA-based autonomous-vehicle experiments. The results indicate that on-device embedding-based monitoring, combined with latency-aware planning, can meaningfully improve the reliability and safety of agile robots, with promising directions for reducing latency, refining recovery-region selection, and enabling continual learning.

Abstract

Foundation models, e.g., large language models (LLMs), trained on internet-scale data possess zero-shot generalization capabilities that make them a promising technology towards detecting and mitigating out-of-distribution failure modes of robotic systems. Fully realizing this promise, however, poses two challenges: (i) mitigating the considerable computational expense of these models such that they may be applied online, and (ii) incorporating their judgement regarding potential anomalies into a safe control framework. In this work, we present a two-stage reasoning framework: First is a fast binary anomaly classifier that analyzes observations in an LLM embedding space, which may then trigger a slower fallback selection stage that utilizes the reasoning capabilities of generative LLMs. These stages correspond to branch points in a model predictive control strategy that maintains the joint feasibility of continuing along various fallback plans to account for the slow reasoner's latency as soon as an anomaly is detected, thus ensuring safety. We show that our fast anomaly classifier outperforms autoregressive reasoning with state-of-the-art GPT models, even when instantiated with relatively small language models. This enables our runtime monitor to improve the trustworthiness of dynamic robotic systems, such as quadrotors or autonomous vehicles, under resource and time constraints. Videos illustrating our approach in both simulation and real-world experiments are available on this project page: https://sites.google.com/view/aesop-llm.

Real-Time Anomaly Detection and Reactive Planning with Large Language Models

TL;DR

This work tackles real-time anomaly detection and safe reactive planning for autonomous robots under distributional shift by introducing AESOP, a two-stage framework that leverages LLM embeddings for fast anomaly detection and autoregressive LLM reasoning for hazard assessment. The fast stage grounds observations in embedding space learned from nominal experiences to flag deviations quickly, while the slow stage uses generative reasoning to decide on safety interventions, all integrated into a model predictive control scheme that maintains multiple feasible recovery trajectories to accommodate LLM latency. The authors demonstrate how embedding-based detectors with small models can outperform high-capacity generative baselines in many scenarios, and show that accounting for the latency of slow reasoning preserves safety and task progress in simulated and real robotic platforms (quadrotor) as well as in CARLA-based autonomous-vehicle experiments. The results indicate that on-device embedding-based monitoring, combined with latency-aware planning, can meaningfully improve the reliability and safety of agile robots, with promising directions for reducing latency, refining recovery-region selection, and enabling continual learning.

Abstract

Foundation models, e.g., large language models (LLMs), trained on internet-scale data possess zero-shot generalization capabilities that make them a promising technology towards detecting and mitigating out-of-distribution failure modes of robotic systems. Fully realizing this promise, however, poses two challenges: (i) mitigating the considerable computational expense of these models such that they may be applied online, and (ii) incorporating their judgement regarding potential anomalies into a safe control framework. In this work, we present a two-stage reasoning framework: First is a fast binary anomaly classifier that analyzes observations in an LLM embedding space, which may then trigger a slower fallback selection stage that utilizes the reasoning capabilities of generative LLMs. These stages correspond to branch points in a model predictive control strategy that maintains the joint feasibility of continuing along various fallback plans to account for the slow reasoner's latency as soon as an anomaly is detected, thus ensuring safety. We show that our fast anomaly classifier outperforms autoregressive reasoning with state-of-the-art GPT models, even when instantiated with relatively small language models. This enables our runtime monitor to improve the trustworthiness of dynamic robotic systems, such as quadrotors or autonomous vehicles, under resource and time constraints. Videos illustrating our approach in both simulation and real-world experiments are available on this project page: https://sites.google.com/view/aesop-llm.
Paper Structure (40 sections, 2 theorems, 6 equations, 16 figures, 9 tables, 1 algorithm)

This paper contains 40 sections, 2 theorems, 6 equations, 16 figures, 9 tables, 1 algorithm.

Key Result

Theorem 1

Suppose that at $t=0$, the MPC in eq:modification-mpc is feasible for some set of recovery strategies $\mathcal{Y} \subset \{1, \dots, d\}$, i.e., that $J_0(\mathcal{Y}, K, T) < \infty$. Then, the closed-loop system formed by eq:dyn and Algorithm alg:safe-mpc ensures the following: 1) We satisfy sta

Figures (16)

  • Figure 1: We present an embedding-based runtime monitoring scheme using fast and slow language model reasoners in concert. During nominal operation, the fast reasoner differentiates between nominal and anomalous robot observations. If an anomaly is flagged, the system enters a fallback-safe state while the slow reasoner determines the anomaly's hazard. In this fallback-safe state, we guarantee access to a set of safe recovery plans (if the anomaly is consequential) and access to continued nominal operation (if the anomaly is inconsequential).
  • Figure 2: Embedding-based (fast) anomaly detection results for the manipulation, autonomous vehicle, and VTOL domains. The top row of figures plot anomaly detection accuracy as a function of experiences sampled IID from the respective domain datasets. The bottom row of figures plot accuracy as a function of the concepts sampled from the respective domain datasets. We use top-5 scoring with the anomaly detection threshold set at the $95$-th quantile \ref{['eq:emp-quantile']} of the scores in the sampled data.
  • Figure 3: Closed-loop trajectory of a quadrotor using the AESOP algorithm. The figure represents a snapshot of the quadrotor at $t=2.5\mathrm{s}$: The trajectory until time $t$ is in black. The nominal trajectory plan is shown in blue, with a blue dot denoting the first consensus constraint in \ref{['eq:modification-mpc']}. The overlapping recovery trajectory plans, up to the consensus horizon corresponding to the LLM latency $K$, are in orange. The recovery trajectory plans deviate after $K$, shown in red, and they each reach their respective recovery region (in green). The blue text callout shows how the fast anomaly detector issues a warning and triggers the slow reasoner at $t=2.5\mathrm{s}$. The red callout shows the response from the slow reasoner, which the LLM returns within the $K$ consensus timesteps in the recovery plans.
  • Figure 4: Annotated depiction of our quadrotor hardware experiment. The quadrotor's goal is to land on the red box. In the event of an anomaly, it can either recover by landing on the blue box, or by hovering within the designated holding zone.
  • Figure 5: Embedding-based anomaly detection results for the manipulation, autonomous vehicle, and VTOL domains. The top row of figures plot the AUROC as a function of experiences sampled IID from the respective domain datasets. The bottom row of figures plot accuracy as a function of the concepts sampled from the respective domain datasets.
  • ...and 11 more figures

Theorems & Definitions (5)

  • Theorem 1
  • proof
  • Definition 1: Control Invariant Set BorrelliBemporadEtAl2017
  • Theorem 1
  • proof