Table of Contents
Fetching ...

An Expert Ensemble for Detecting Anomalous Scenes, Interactions, and Behaviors in Autonomous Driving

Tianchen Ji, Neeloy Chakraborty, Andre Schreiber, Katherine Driggs-Campbell

TL;DR

This work targets online anomaly detection for autonomous driving using a monocular camera. It decomposes anomalies into three patterns and builds three unsupervised experts: a scene expert (frame-level), an interaction expert (relative motions between pairs), and a behavior expert (future trajectory prediction). An expert ensemble via a Kalman filter fuses their scores into a single online anomaly score, with normalization and thresholds learned from normal data. The proposed approach, evaluated on the large-scale DoTA dataset with a realistic evaluation protocol, outperforms baselines and enables unsupervised classification of anomaly types. The results demonstrate the practical potential of multi-expert, online anomaly detection to enhance safety in diverse driving scenarios and hint at future integration with richer modalities and foundation models.

Abstract

As automated vehicles enter public roads, safety in a near-infinite number of driving scenarios becomes one of the major concerns for the widespread adoption of fully autonomous driving. The ability to detect anomalous situations outside of the operational design domain is a key component in self-driving cars, enabling us to mitigate the impact of abnormal ego behaviors and to realize trustworthy driving systems. On-road anomaly detection in egocentric videos remains a challenging problem due to the difficulties introduced by complex and interactive scenarios. We conduct a holistic analysis of common on-road anomaly patterns, from which we propose three unsupervised anomaly detection experts: a scene expert that focuses on frame-level appearances to detect abnormal scenes and unexpected scene motions; an interaction expert that models normal relative motions between two road participants and raises alarms whenever anomalous interactions emerge; and a behavior expert which monitors abnormal behaviors of individual objects by future trajectory prediction. To combine the strengths of all the modules, we propose an expert ensemble (Xen) using a Kalman filter, in which the final anomaly score is absorbed as one of the states and the observations are generated by the experts. Our experiments employ a novel evaluation protocol for realistic model performance, demonstrate superior anomaly detection performance than previous methods, and show that our framework has potential in classifying anomaly types using unsupervised learning on a large-scale on-road anomaly dataset.

An Expert Ensemble for Detecting Anomalous Scenes, Interactions, and Behaviors in Autonomous Driving

TL;DR

This work targets online anomaly detection for autonomous driving using a monocular camera. It decomposes anomalies into three patterns and builds three unsupervised experts: a scene expert (frame-level), an interaction expert (relative motions between pairs), and a behavior expert (future trajectory prediction). An expert ensemble via a Kalman filter fuses their scores into a single online anomaly score, with normalization and thresholds learned from normal data. The proposed approach, evaluated on the large-scale DoTA dataset with a realistic evaluation protocol, outperforms baselines and enables unsupervised classification of anomaly types. The results demonstrate the practical potential of multi-expert, online anomaly detection to enhance safety in diverse driving scenarios and hint at future integration with richer modalities and foundation models.

Abstract

As automated vehicles enter public roads, safety in a near-infinite number of driving scenarios becomes one of the major concerns for the widespread adoption of fully autonomous driving. The ability to detect anomalous situations outside of the operational design domain is a key component in self-driving cars, enabling us to mitigate the impact of abnormal ego behaviors and to realize trustworthy driving systems. On-road anomaly detection in egocentric videos remains a challenging problem due to the difficulties introduced by complex and interactive scenarios. We conduct a holistic analysis of common on-road anomaly patterns, from which we propose three unsupervised anomaly detection experts: a scene expert that focuses on frame-level appearances to detect abnormal scenes and unexpected scene motions; an interaction expert that models normal relative motions between two road participants and raises alarms whenever anomalous interactions emerge; and a behavior expert which monitors abnormal behaviors of individual objects by future trajectory prediction. To combine the strengths of all the modules, we propose an expert ensemble (Xen) using a Kalman filter, in which the final anomaly score is absorbed as one of the states and the observations are generated by the experts. Our experiments employ a novel evaluation protocol for realistic model performance, demonstrate superior anomaly detection performance than previous methods, and show that our framework has potential in classifying anomaly types using unsupervised learning on a large-scale on-road anomaly dataset.

Paper Structure

This paper contains 24 sections, 30 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Challenging examples of anomalies. Sample frames are ordered in time. Bounding boxes in each frame mark the anomaly participants.
  • Figure 2: Left: Nodes are possible actors in an on-road anomaly. Each edge represents a type of anomaly happening between the two connected nodes. The edges are grouped by colors into three categories to guide our anomaly detector design. Right: Three experts are proposed for detecting different types of anomalies based on the anomaly pattern analysis. Individual expert scores are fused by a Kalman filter to generate a comprehensive final score.
  • Figure 3: Model architecture of OFP. Each box corresponds to a multi-channel feature map. The number of channels is denoted on top of the box. The spatial resolution is provided at the lower left edge of the box. Arrows with different colors denote different operations.
  • Figure 4: Model architecture of FFP. Each circle and box represents an operation. The last number in the box of each convolution layer denotes the number of filters.
  • Figure 5: Model architecture of STR. The optical flow and disparity map are concatenated and reconstructed by a fully convolutional autoencoder. The annotations can be interpreted in the same way as those in Figure \ref{['fig:ofp']}.
  • ...and 6 more figures