Table of Contents
Fetching ...

uTRAND: Unsupervised Anomaly Detection in Traffic Trajectories

Giacomo D'Amicantonio, Egor Bondarau, Peter H. N. de With

TL;DR

The paper tackles unsupervised anomaly detection in traffic trajectories by shifting from pixel-based analysis to a semantic-topological representation of intersections. It constructs a patch-based graph over semantically segmented BEV images (using ITF maps) and learns normal inter-node transitions to detect anomalous trajectories, enabling human-readable classification rules. The methodology combines automated camera calibration (GNN-based homographies), agent detection/tracking (YOLOv8 with BEV fusion), and a graph-based anomaly detector with attributes $S_{node_i}$, $T_{avg}$, and $A$, all operating without labeled anomaly data. On real-world multi-camera data, uTRAND outperforms state-of-the-art unsupervised methods in $F_1$ while providing interpretable explanations for detected anomalies.

Abstract

Deep learning-based approaches have achieved significant improvements on public video anomaly datasets, but often do not perform well in real-world applications. This paper addresses two issues: the lack of labeled data and the difficulty of explaining the predictions of a neural network. To this end, we present a framework called uTRAND, that shifts the problem of anomalous trajectory prediction from the pixel space to a semantic-topological domain. The framework detects and tracks all types of traffic agents in bird's-eye-view videos of traffic cameras mounted at an intersection. By conceptualizing the intersection as a patch-based graph, it is shown that the framework learns and models the normal behaviour of traffic agents without costly manual labeling. Furthermore, uTRAND allows to formulate simple rules to classify anomalous trajectories in a way suited for human interpretation. We show that uTRAND outperforms other state-of-the-art approaches on a dataset of anomalous trajectories collected in a real-world setting, while producing explainable detection results.

uTRAND: Unsupervised Anomaly Detection in Traffic Trajectories

TL;DR

The paper tackles unsupervised anomaly detection in traffic trajectories by shifting from pixel-based analysis to a semantic-topological representation of intersections. It constructs a patch-based graph over semantically segmented BEV images (using ITF maps) and learns normal inter-node transitions to detect anomalous trajectories, enabling human-readable classification rules. The methodology combines automated camera calibration (GNN-based homographies), agent detection/tracking (YOLOv8 with BEV fusion), and a graph-based anomaly detector with attributes , , and , all operating without labeled anomaly data. On real-world multi-camera data, uTRAND outperforms state-of-the-art unsupervised methods in while providing interpretable explanations for detected anomalies.

Abstract

Deep learning-based approaches have achieved significant improvements on public video anomaly datasets, but often do not perform well in real-world applications. This paper addresses two issues: the lack of labeled data and the difficulty of explaining the predictions of a neural network. To this end, we present a framework called uTRAND, that shifts the problem of anomalous trajectory prediction from the pixel space to a semantic-topological domain. The framework detects and tracks all types of traffic agents in bird's-eye-view videos of traffic cameras mounted at an intersection. By conceptualizing the intersection as a patch-based graph, it is shown that the framework learns and models the normal behaviour of traffic agents without costly manual labeling. Furthermore, uTRAND allows to formulate simple rules to classify anomalous trajectories in a way suited for human interpretation. We show that uTRAND outperforms other state-of-the-art approaches on a dataset of anomalous trajectories collected in a real-world setting, while producing explainable detection results.
Paper Structure (11 sections, 3 equations, 5 figures, 3 tables)

This paper contains 11 sections, 3 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of uTRAND. At the first stage, the cameras are calibrated and the videos are warped, obtaining their BEV videos. At the second stage, the traffic agents are detected and tracked in the camera video and translated to the BEV video. In the last stage, the intersection is split in semantic patches (nodes). The previously detected agents are associated with the patch they occupy at each frame. The links between the nodes and normal behavior of an agent in that node are learned by analyzing a few hours of camera videos through the framework, modeling the dynamics of the intersection. An agent that deviates from the modeled behaviors is detected as anomalous. The anomaly can further be classified by using simple rules that do not require domain knowledge.
  • Figure 2: The agents are initially detected in the camera view. YoloV8 estimates a bounding box for each agent, assigns to it an ID, and tracks it across frames. Subsequently, the detected bounding boxes are projected in the BEV, where the three dimensional bounding boxes are detected, carrying over the id assigned to the agent in the camera view.
  • Figure 3: The intersection is split into patches according to the ITF map. The nodes can be of four different classes: road, bicycle lane, curb and crosswalk. The framework establishes connections between the nodes in which traffic agents move more often.
  • Figure 4: Semantically segmented bird's-eye-view of the intersection (called BEV intersection). The fields-of-view of the three cameras overlap at the center of the intersection.
  • Figure 5: Camera views on the same intersection and the corresponding trajectory visualization. Top row: two frames of a vehicle performing improper turns. In subfigure \ref{['fig:wrongturnimg']}, the car is turning right and then drives in the opposite lane, while in subfigure \ref{['fig:donutimg']} the same vehicle is driving in a circle in the intersection. Bottom row: the uTRAND's visualization of the trajectories of the two actions. Both trajectories are correctly detected and classified as anomalies.