Table of Contents
Fetching ...

Robust Anomaly Detection through Multi-Modal Autoencoder Fusion for Small Vehicle Damage Detection

Sara Khan, Mehmed Yüksel, Frank Kirchner

TL;DR

The paper tackles the challenge of reliably detecting minor vehicle damages in fleet and car-sharing contexts, where visual systems struggle for underbody and in-motion damage. It introduces a windshield-mounted device integrating IMU and microphone sensors and uses mid-fusion autoencoder-based anomaly detection to achieve real-time damage identification, achieving a ROC-AUC of $0.92$ with the pooling-based multi-modal architecture (MAA3). The study systematically compares mono- and multi-modal designs, analyzes loss functions (notably favoring Log-Cosh), and demonstrates that pooling-based fusion offers a favorable balance of accuracy and efficiency for edge deployment. It also shows cross-domain generalisability on an open robotics dataset after retraining, suggesting readiness for broader safety applications and potential integration with automotive safety systems and autonomous vehicle sensing stacks.

Abstract

Wear and tear detection in fleet and shared vehicle systems is a critical challenge, particularly in rental and car-sharing services, where minor damage, such as dents, scratches, and underbody impacts, often goes unnoticed or is detected too late. Currently, manual inspection methods are the default approach, but are labour-intensive and prone to human error. In contrast, state-of-the-art image-based methods are less reliable when the vehicle is moving, and they cannot effectively capture underbody damage due to limited visual access and spatial coverage. This work introduces a novel multi-modal architecture based on anomaly detection to address these issues. Sensors such as Inertial Measurement Units (IMUs) and microphones are integrated into a compact device mounted on the vehicle's windshield. This approach supports real-time damage detection while avoiding the need for highly resource-intensive sensors. We developed multiple variants of multi-modal autoencoder-based architectures and evaluated them against unimodal and state-of-the-art methods. Our multi-modal ensemble model with pooling achieved the highest performance, with a Receiver Operating Characteristic-Area Under Curve (ROC-AUC) of 92%, demonstrating its effectiveness in real-world applications. This approach can also be extended to other applications, such as improving automotive safety. It can integrate with airbag systems for efficient deployment and help autonomous vehicles by complementing other sensors in collision detection.

Robust Anomaly Detection through Multi-Modal Autoencoder Fusion for Small Vehicle Damage Detection

TL;DR

The paper tackles the challenge of reliably detecting minor vehicle damages in fleet and car-sharing contexts, where visual systems struggle for underbody and in-motion damage. It introduces a windshield-mounted device integrating IMU and microphone sensors and uses mid-fusion autoencoder-based anomaly detection to achieve real-time damage identification, achieving a ROC-AUC of with the pooling-based multi-modal architecture (MAA3). The study systematically compares mono- and multi-modal designs, analyzes loss functions (notably favoring Log-Cosh), and demonstrates that pooling-based fusion offers a favorable balance of accuracy and efficiency for edge deployment. It also shows cross-domain generalisability on an open robotics dataset after retraining, suggesting readiness for broader safety applications and potential integration with automotive safety systems and autonomous vehicle sensing stacks.

Abstract

Wear and tear detection in fleet and shared vehicle systems is a critical challenge, particularly in rental and car-sharing services, where minor damage, such as dents, scratches, and underbody impacts, often goes unnoticed or is detected too late. Currently, manual inspection methods are the default approach, but are labour-intensive and prone to human error. In contrast, state-of-the-art image-based methods are less reliable when the vehicle is moving, and they cannot effectively capture underbody damage due to limited visual access and spatial coverage. This work introduces a novel multi-modal architecture based on anomaly detection to address these issues. Sensors such as Inertial Measurement Units (IMUs) and microphones are integrated into a compact device mounted on the vehicle's windshield. This approach supports real-time damage detection while avoiding the need for highly resource-intensive sensors. We developed multiple variants of multi-modal autoencoder-based architectures and evaluated them against unimodal and state-of-the-art methods. Our multi-modal ensemble model with pooling achieved the highest performance, with a Receiver Operating Characteristic-Area Under Curve (ROC-AUC) of 92%, demonstrating its effectiveness in real-world applications. This approach can also be extended to other applications, such as improving automotive safety. It can integrate with airbag systems for efficient deployment and help autonomous vehicles by complementing other sensors in collision detection.

Paper Structure

This paper contains 33 sections, 4 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Examples of vehicle damages categorised by severity, adapted from Khan2023.
  • Figure 2: Schematic overview of multi-modal sensor fusion strategies. Early fusion combines raw IMU and audio signals before feature extraction; feature-level fusion merges learned features; and decision-level fusion integrates predictions at the output stage.
  • Figure 3: Structure of an autoencoder used for anomaly detection. The encoder compresses input signals (X) into a latent representation (Z), while the decoder reconstructs them ($X'$), enabling damage-related anomaly detection through reconstruction error.
  • Figure 4: Architecture of the Small Damage Detection (SDD) pipeline, showing sensor input, preprocessing, model inference, and cloud-based logging for confirmed events. The system triggers data capture when acceleration thresholds are exceeded.
  • Figure 5: Comparative feature characteristics of two damage types: (a) dent (top) and (b) scratch (bottom). The left column shows temporal variations, and the right column shows event-related features derived from accelerometer and audio signals.
  • ...and 7 more figures