Multimodal Anomaly Detection based on Deep Auto-Encoder for Object Slip Perception of Mobile Manipulation Robots
Youngjae Yoo, Chung-Yeon Lee, Byoung-Tak Zhang
TL;DR
The paper tackles robust slip perception for mobile manipulation robots by fusing RGB, depth, audio, and force–torque data through a deep autoencoder-based anomaly-detection framework. It introduces a data integration pipeline that synchronizes and normalizes heterogeneous modalities, and uses a 5-layer encoder–decoder with a 100-unit bottleneck to learn normal latent representations, scoring anomalies via Normalized Aggregation along Pathway (NAP). Evaluation on a mobile service robot with diverse household objects and controlled noise demonstrates that multimodal sensing outperforms unimodal inputs across metrics (AUROC/AUPRC/F1), with real-time performance around 29 ms per inference. The results highlight the complementary value of sensors for robust slip detection in dynamic real-world environments and point to future work on broader object sets and more complex scenarios.
Abstract
Object slip perception is essential for mobile manipulation robots to perform manipulation tasks reliably in the dynamic real-world. Traditional approaches to robot arms' slip perception use tactile or vision sensors. However, mobile robots still have to deal with noise in their sensor signals caused by the robot's movement in a changing environment. To solve this problem, we present an anomaly detection method that utilizes multisensory data based on a deep autoencoder model. The proposed framework integrates heterogeneous data streams collected from various robot sensors, including RGB and depth cameras, a microphone, and a force-torque sensor. The integrated data is used to train a deep autoencoder to construct latent representations of the multisensory data that indicate the normal status. Anomalies can then be identified by error scores measured by the difference between the trained encoder's latent values and the latent values of reconstructed input data. In order to evaluate the proposed framework, we conducted an experiment that mimics an object slip by a mobile service robot operating in a real-world environment with diverse household objects and different moving patterns. The experimental results verified that the proposed framework reliably detects anomalies in object slip situations despite various object types and robot behaviors, and visual and auditory noise in the environment.
