Table of Contents
Fetching ...

RC-NF: Robot-Conditioned Normalizing Flow for Real-Time Anomaly Detection in Robotic Manipulation

Shijie Zhou, Bin Zhu, Jiarui Yang, Xiangyu Zhao, Jingjing Chen, Yu-Gang Jiang

Abstract

Recent advances in Vision-Language-Action (VLA) models have enabled robots to execute increasingly complex tasks. However, VLA models trained through imitation learning struggle to operate reliably in dynamic environments and often fail under Out-of-Distribution (OOD) conditions. To address this issue, we propose Robot-Conditioned Normalizing Flow (RC-NF), a real-time monitoring model for robotic anomaly detection and intervention that ensures the robot's state and the object's motion trajectory align with the task. RC-NF decouples the processing of task-aware robot and object states within the normalizing flow. It requires only positive samples for unsupervised training and calculates accurate robotic anomaly scores during inference through the probability density function. We further present LIBERO-Anomaly-10, a benchmark comprising three categories of robotic anomalies for simulation evaluation. RC-NF achieves state-of-the-art performance across all anomaly types compared to previous methods in monitoring robotic tasks. Real-world experiments demonstrate that RC-NF operates as a plug-and-play module for VLA models (e.g., pi0), providing a real-time OOD signal that enables state-level rollback or task-level replanning when necessary, with a response latency under 100 ms. These results demonstrate that RC-NF noticeably enhances the robustness and adaptability of VLA-based robotic systems in dynamic environments.

RC-NF: Robot-Conditioned Normalizing Flow for Real-Time Anomaly Detection in Robotic Manipulation

Abstract

Recent advances in Vision-Language-Action (VLA) models have enabled robots to execute increasingly complex tasks. However, VLA models trained through imitation learning struggle to operate reliably in dynamic environments and often fail under Out-of-Distribution (OOD) conditions. To address this issue, we propose Robot-Conditioned Normalizing Flow (RC-NF), a real-time monitoring model for robotic anomaly detection and intervention that ensures the robot's state and the object's motion trajectory align with the task. RC-NF decouples the processing of task-aware robot and object states within the normalizing flow. It requires only positive samples for unsupervised training and calculates accurate robotic anomaly scores during inference through the probability density function. We further present LIBERO-Anomaly-10, a benchmark comprising three categories of robotic anomalies for simulation evaluation. RC-NF achieves state-of-the-art performance across all anomaly types compared to previous methods in monitoring robotic tasks. Real-world experiments demonstrate that RC-NF operates as a plug-and-play module for VLA models (e.g., pi0), providing a real-time OOD signal that enables state-level rollback or task-level replanning when necessary, with a response latency under 100 ms. These results demonstrate that RC-NF noticeably enhances the robustness and adaptability of VLA-based robotic systems in dynamic environments.
Paper Structure (19 sections, 16 equations, 11 figures, 2 tables, 1 algorithm)

This paper contains 19 sections, 16 equations, 11 figures, 2 tables, 1 algorithm.

Figures (11)

  • Figure 1: We propose Robot-Conditioned Normalizing Flow (RC-NF) model to monitor in real time whether the robot's execution state and object motion trajectory remain consistent with the task. To evaluate anomaly (i.e., Out-of-Distribution, OOD) detection performance, we introduce LIBERO-Anomaly-10, which includes three common robotic anomalies. Real-world experiments further demonstrate RC-NF manages to enhance the adaptability of the VLA models (e.g., $\pi_0$).
  • Figure 2: Overview of our framework. Our Robot-Conditioned Normalizing Flow (RC-NF) operates as a real-time runtime monitor for robotic manipulation tasks. (Left) SAM2 extracts object masks from video streaming, which are then grid-sampled into point sets. Task prompts are encoded using spherical uniform encoding, and robot proprioception provides joint, gripper, and pose states. (Center) Our RC-NF leverages these signals as conditions within the affine coupling layers in the proposed RCPQNet (Sec . \ref{['sec:RCPQNet']}) to apply $K$ invertible transformations and compute anomaly scores for the current task. (Right) When the anomaly score exceeds the threshold, the Anomaly Detection and Handling module triggers corrective behaviors by task replanning for task-level OOD and task rollback for state-level OOD.
  • Figure 3: Robot-Conditioned Point Query Network (RCPQNet) is introduced as the affine coupling layer in RC-NF that generates shift and scale parameters.
  • Figure 4: Visualization of the anomaly scores for the task Pick up the book and place it in the back compartment of the caddy. The x-axis denotes the time steps, and the y-axis indicates the anomaly score. The red dashed line marks $t_{\text{anomaly}}$.
  • Figure 5: A real-world comparison of $\pi_0$ and $\pi_0$ + RC-NF (ours) during the task placing the ball into the open drawer, when the drawer closes midway.
  • ...and 6 more figures