Table of Contents
Fetching ...

EngineAD: A Real-World Vehicle Engine Anomaly Detection Dataset

Hadi Hojjati, Christopher Roth, Rory Woods, Ken Sills, Narges Armanfard

Abstract

The progress of Anomaly Detection (AD) in safety-critical domains, such as transportation, is severely constrained by the lack of large-scale, real-world benchmarks. To address this, we introduce EngineAD, a novel, multivariate dataset comprising high-resolution sensor telemetry collected from a fleet of 25 commercial vehicles over a six-month period. Unlike synthetic datasets, EngineAD features authentic operational data labeled with expert annotations, distinguishing normal states from subtle indicators of incipient engine faults. We preprocess the data into $300$-timestep segments of $8$ principal components and establish an initial benchmark using nine diverse one-class anomaly detection models. Our experiments reveal significant performance variability across the vehicle fleet, underscoring the challenge of cross-vehicle generalization. Furthermore, our findings corroborate recent literature, showing that simple classical methods (e.g., K-Means and One-Class SVM) are often highly competitive with, or superior to, deep learning approaches in this segment-based evaluation. By publicly releasing EngineAD, we aim to provide a realistic, challenging resource for developing robust and field-deployable anomaly detection and anomaly prediction solutions for the automotive industry.

EngineAD: A Real-World Vehicle Engine Anomaly Detection Dataset

Abstract

The progress of Anomaly Detection (AD) in safety-critical domains, such as transportation, is severely constrained by the lack of large-scale, real-world benchmarks. To address this, we introduce EngineAD, a novel, multivariate dataset comprising high-resolution sensor telemetry collected from a fleet of 25 commercial vehicles over a six-month period. Unlike synthetic datasets, EngineAD features authentic operational data labeled with expert annotations, distinguishing normal states from subtle indicators of incipient engine faults. We preprocess the data into -timestep segments of principal components and establish an initial benchmark using nine diverse one-class anomaly detection models. Our experiments reveal significant performance variability across the vehicle fleet, underscoring the challenge of cross-vehicle generalization. Furthermore, our findings corroborate recent literature, showing that simple classical methods (e.g., K-Means and One-Class SVM) are often highly competitive with, or superior to, deep learning approaches in this segment-based evaluation. By publicly releasing EngineAD, we aim to provide a realistic, challenging resource for developing robust and field-deployable anomaly detection and anomaly prediction solutions for the automotive industry.

Paper Structure

This paper contains 16 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overview of Data Recording Process. Sensor signals from the vehicles were transmitted via the CAN bus and captured using a proprietary logging device connected to the network. Following data collection, a team of technicians systematically analyzed the sensor traces and maintenance information to assign ground-truth labels.
  • Figure 2: Preprocessing pipeline for vehicle sensor data. Thirteen sensors were selected with technician input. Data were cleaned, engine-on periods identified, resampled to 1Hz, and segmented into non-overlapping five-minute windows. Incomplete or missing segments were removed, and each vehicle’s labeled segments were saved for analysis.