Table of Contents
Fetching ...

Multimodal Anomaly Detection based on Deep Auto-Encoder for Object Slip Perception of Mobile Manipulation Robots

Youngjae Yoo, Chung-Yeon Lee, Byoung-Tak Zhang

TL;DR

The paper tackles robust slip perception for mobile manipulation robots by fusing RGB, depth, audio, and force–torque data through a deep autoencoder-based anomaly-detection framework. It introduces a data integration pipeline that synchronizes and normalizes heterogeneous modalities, and uses a 5-layer encoder–decoder with a 100-unit bottleneck to learn normal latent representations, scoring anomalies via Normalized Aggregation along Pathway (NAP). Evaluation on a mobile service robot with diverse household objects and controlled noise demonstrates that multimodal sensing outperforms unimodal inputs across metrics (AUROC/AUPRC/F1), with real-time performance around 29 ms per inference. The results highlight the complementary value of sensors for robust slip detection in dynamic real-world environments and point to future work on broader object sets and more complex scenarios.

Abstract

Object slip perception is essential for mobile manipulation robots to perform manipulation tasks reliably in the dynamic real-world. Traditional approaches to robot arms' slip perception use tactile or vision sensors. However, mobile robots still have to deal with noise in their sensor signals caused by the robot's movement in a changing environment. To solve this problem, we present an anomaly detection method that utilizes multisensory data based on a deep autoencoder model. The proposed framework integrates heterogeneous data streams collected from various robot sensors, including RGB and depth cameras, a microphone, and a force-torque sensor. The integrated data is used to train a deep autoencoder to construct latent representations of the multisensory data that indicate the normal status. Anomalies can then be identified by error scores measured by the difference between the trained encoder's latent values and the latent values of reconstructed input data. In order to evaluate the proposed framework, we conducted an experiment that mimics an object slip by a mobile service robot operating in a real-world environment with diverse household objects and different moving patterns. The experimental results verified that the proposed framework reliably detects anomalies in object slip situations despite various object types and robot behaviors, and visual and auditory noise in the environment.

Multimodal Anomaly Detection based on Deep Auto-Encoder for Object Slip Perception of Mobile Manipulation Robots

TL;DR

The paper tackles robust slip perception for mobile manipulation robots by fusing RGB, depth, audio, and force–torque data through a deep autoencoder-based anomaly-detection framework. It introduces a data integration pipeline that synchronizes and normalizes heterogeneous modalities, and uses a 5-layer encoder–decoder with a 100-unit bottleneck to learn normal latent representations, scoring anomalies via Normalized Aggregation along Pathway (NAP). Evaluation on a mobile service robot with diverse household objects and controlled noise demonstrates that multimodal sensing outperforms unimodal inputs across metrics (AUROC/AUPRC/F1), with real-time performance around 29 ms per inference. The results highlight the complementary value of sensors for robust slip detection in dynamic real-world environments and point to future work on broader object sets and more complex scenarios.

Abstract

Object slip perception is essential for mobile manipulation robots to perform manipulation tasks reliably in the dynamic real-world. Traditional approaches to robot arms' slip perception use tactile or vision sensors. However, mobile robots still have to deal with noise in their sensor signals caused by the robot's movement in a changing environment. To solve this problem, we present an anomaly detection method that utilizes multisensory data based on a deep autoencoder model. The proposed framework integrates heterogeneous data streams collected from various robot sensors, including RGB and depth cameras, a microphone, and a force-torque sensor. The integrated data is used to train a deep autoencoder to construct latent representations of the multisensory data that indicate the normal status. Anomalies can then be identified by error scores measured by the difference between the trained encoder's latent values and the latent values of reconstructed input data. In order to evaluate the proposed framework, we conducted an experiment that mimics an object slip by a mobile service robot operating in a real-world environment with diverse household objects and different moving patterns. The experimental results verified that the proposed framework reliably detects anomalies in object slip situations despite various object types and robot behaviors, and visual and auditory noise in the environment.
Paper Structure (18 sections, 2 equations, 8 figures, 2 tables)

This paper contains 18 sections, 2 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Example of an object slip perception experiment
  • Figure 2: The overall architecture of the multimodal anomaly detection framework
  • Figure 3: An autoencoder model used in the proposed framework
  • Figure 4: Type and location of sensors on the HSR robot
  • Figure 5: Experimental Protocols for the slip detection of the mobile robot
  • ...and 3 more figures