Table of Contents
Fetching ...

Conditional Flow Matching for Continuous Anomaly Detection in Autonomous Driving on a Manifold-Aware Spectral Space

Antonio Guillen-Perez

TL;DR

This work presents Deep-Flow, an unsupervised anomaly-detection framework for Level 4 autonomous driving that models the continuous density of expert behavior using Optimal Transport Conditional Flow Matching (OT-CFM) on a low-rank spectral manifold derived from PCA. The architecture combines lane-aware goal conditioning, a goal-conditioned Early Fusion Transformer, and an exact log-likelihood computation via the Jacobian trace, enabling stable and interpretable anomaly scores. Key contributions include the spectral manifold bottleneck (k=12), kinematic-complexity weighting to emphasize rare, high-energy maneuvers, and a Discovery Engine that surfaces semantic, non-normative behaviors overlooked by traditional safety filters. Evaluated on the Waymo Open Motion Dataset, Deep-Flow achieves an AUC-ROC of 0.766 against a golden set of safety events and reveals a distinct separation between kinematic danger and semantic non-compliance, supporting objective, data-driven safety gates for fleet validation and deployment.

Abstract

Safety validation for Level 4 autonomous vehicles (AVs) is currently bottlenecked by the inability to scale the detection of rare, high-risk long-tail scenarios using traditional rule-based heuristics. We present Deep-Flow, an unsupervised framework for safety-critical anomaly detection that utilizes Optimal Transport Conditional Flow Matching (OT-CFM) to characterize the continuous probability density of expert human driving behavior. Unlike standard generative approaches that operate in unstable, high-dimensional coordinate spaces, Deep-Flow constrains the generative process to a low-rank spectral manifold via a Principal Component Analysis (PCA) bottleneck. This ensures kinematic smoothness by design and enables the computation of the exact Jacobian trace for numerically stable, deterministic log-likelihood estimation. To resolve multi-modal ambiguity at complex junctions, we utilize an Early Fusion Transformer encoder with lane-aware goal conditioning, featuring a direct skip-connection to the flow head to maintain intent-integrity throughout the network. We introduce a kinematic complexity weighting scheme that prioritizes high-energy maneuvers (quantified via path tortuosity and jerk) during the simulation-free training process. Evaluated on the Waymo Open Motion Dataset (WOMD), our framework achieves an AUC-ROC of 0.766 against a heuristic golden set of safety-critical events. More significantly, our analysis reveals a fundamental distinction between kinematic danger and semantic non-compliance. Deep-Flow identifies a critical predictability gap by surfacing out-of-distribution behaviors, such as lane-boundary violations and non-normative junction maneuvers, that traditional safety filters overlook. This work provides a mathematically rigorous foundation for defining statistical safety gates, enabling objective, data-driven validation for the safe deployment of autonomous fleets.

Conditional Flow Matching for Continuous Anomaly Detection in Autonomous Driving on a Manifold-Aware Spectral Space

TL;DR

This work presents Deep-Flow, an unsupervised anomaly-detection framework for Level 4 autonomous driving that models the continuous density of expert behavior using Optimal Transport Conditional Flow Matching (OT-CFM) on a low-rank spectral manifold derived from PCA. The architecture combines lane-aware goal conditioning, a goal-conditioned Early Fusion Transformer, and an exact log-likelihood computation via the Jacobian trace, enabling stable and interpretable anomaly scores. Key contributions include the spectral manifold bottleneck (k=12), kinematic-complexity weighting to emphasize rare, high-energy maneuvers, and a Discovery Engine that surfaces semantic, non-normative behaviors overlooked by traditional safety filters. Evaluated on the Waymo Open Motion Dataset, Deep-Flow achieves an AUC-ROC of 0.766 against a golden set of safety events and reveals a distinct separation between kinematic danger and semantic non-compliance, supporting objective, data-driven safety gates for fleet validation and deployment.

Abstract

Safety validation for Level 4 autonomous vehicles (AVs) is currently bottlenecked by the inability to scale the detection of rare, high-risk long-tail scenarios using traditional rule-based heuristics. We present Deep-Flow, an unsupervised framework for safety-critical anomaly detection that utilizes Optimal Transport Conditional Flow Matching (OT-CFM) to characterize the continuous probability density of expert human driving behavior. Unlike standard generative approaches that operate in unstable, high-dimensional coordinate spaces, Deep-Flow constrains the generative process to a low-rank spectral manifold via a Principal Component Analysis (PCA) bottleneck. This ensures kinematic smoothness by design and enables the computation of the exact Jacobian trace for numerically stable, deterministic log-likelihood estimation. To resolve multi-modal ambiguity at complex junctions, we utilize an Early Fusion Transformer encoder with lane-aware goal conditioning, featuring a direct skip-connection to the flow head to maintain intent-integrity throughout the network. We introduce a kinematic complexity weighting scheme that prioritizes high-energy maneuvers (quantified via path tortuosity and jerk) during the simulation-free training process. Evaluated on the Waymo Open Motion Dataset (WOMD), our framework achieves an AUC-ROC of 0.766 against a heuristic golden set of safety-critical events. More significantly, our analysis reveals a fundamental distinction between kinematic danger and semantic non-compliance. Deep-Flow identifies a critical predictability gap by surfacing out-of-distribution behaviors, such as lane-boundary violations and non-normative junction maneuvers, that traditional safety filters overlook. This work provides a mathematically rigorous foundation for defining statistical safety gates, enabling objective, data-driven validation for the safe deployment of autonomous fleets.
Paper Structure (40 sections, 13 equations, 10 figures, 2 tables)

This paper contains 40 sections, 13 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Overview of the Deep-Flow Framework. (Left) We observe an agent's trajectory within a goal-conditioned context. While both safe (Blue) and anomalous (Orange) maneuvers may reach the same goal, they represent different densities on the driving manifold. (Center) Trajectories are projected into a low-rank spectral manifold where backward ODE integration ($t=1 \to 0$) maps maneuvers to a Gaussian prior. (Right) Deep-Flow identifies safety-critical anomalies by mapping non-normative behaviors to the low-probability tails of the expert distribution, providing a continuous and mathematically rigorous safety score.
  • Figure 2: Deep-Flow Encoder Architecture. Heterogeneous modalities are tokenized and fused via a Hierarchical Transformer. The Goal signal is injected twice: once in the global context and once as a direct skip-connection to the Flow Head to preserve intent-integrity.
  • Figure 3: ROC Analysis. Deep-Flow provides a more reliable signal for long-tail event detection than discrete heuristics.
  • Figure 4: Likelihood Distribution Analysis. The KDE plot illustrates a clear separation between nominal and critical regimes. Nominal driving (Blue) is bimodal, capturing both high-certainty and complex maneuvers. Critical events (Red) are notably absent from the high-likelihood mode, demonstrating that safety-critical anomalies are fundamentally restricted to the low-probability tails of the expert manifold.
  • Figure 5: Latent Flow Dynamics and Physical Grounding. (a) In the spectral latent space, nominal trajectories align with the vector field to reach high-density regions, while anomalies "fight" the flow. (b) Deep-Flow identifies semantic violations: the anomalous scenario shows the actual path (Red) deviating from the learned expert manifold (Cyan), signifying an OOD event.
  • ...and 5 more figures