Table of Contents
Fetching ...

Lane-Frame Quantum Multimodal Driving Forecasts for the Trajectory of Autonomous Vehicles

Navneet Singh, Shiva Raj Pokhrel

TL;DR

This work evaluates a compact hybrid quantum architecture for short-horizon, multi-modal trajectory forecasting in autonomous driving, operating in an ego-centric lane frame and predicting residuals over a kinematic baseline. It combines a 9-qubit quantum attention encoder, a deep but lightweight quantum feedforward network, and a Fourier-based quantum decoder that generates 16 trajectory hypotheses in a single pass, with spectrum-based confidences guiding ranking. Training employs gradient-free SPSA and a min-over-modes loss to achieve stable optimization and meaningful multi-modal forecasts on the Waymo Open Motion Dataset, outperforming a strong lane-following baseline. While not claiming quantum advantage, the study demonstrates that very small quantum circuits can be integrated into a resource-efficient forecasting pipeline, delivering meter-scale errors and diverse futures that are useful for real-time decision making. The results suggest a feasible path to leveraging quantum sub-modules in robotics and autonomous systems under tight compute budgets, with potential extensions to richer context and hardware deployment.

Abstract

Trajectory forecasting for autonomous driving must deliver accurate, calibrated multi-modal futures under tight compute and latency constraints. We propose a compact hybrid quantum architecture that aligns quantum inductive bias with road-scene structure by operating in an ego-centric, lane-aligned frame and predicting residual corrections to a kinematic baseline instead of absolute poses. The model combines a transformer-inspired quantum attention encoder (9 qubits), a parameter-lean quantum feedforward stack (64 layers, ${\sim}1200$ trainable angles), and a Fourier-based decoder that uses shallow entanglement and phase superposition to generate 16 trajectory hypotheses in a single pass, with mode confidences derived from the latent spectrum. All circuit parameters are trained with Simultaneous Perturbation Stochastic Approximation (SPSA), avoiding backpropagation through non-analytic components. In the Waymo Open Motion Dataset, the model achieves minADE (minimum Average Displacement Error) of \SI{1.94}{m} and minFDE (minimum Final Displacement Error) of \SI{3.56}{m} in the $16$ models predicted over the horizon of \SI{2.0}{s}, consistently outperforming a kinematic baseline with reduced miss rates and strong recall. Ablations confirm that residual learning in the lane frame, truncated Fourier decoding, shallow entanglement, and spectrum-based ranking focus capacity where it matters, yielding stable optimization and reliable multi-modal forecasts from small, shallow quantum circuits on a modern autonomous-driving benchmark.

Lane-Frame Quantum Multimodal Driving Forecasts for the Trajectory of Autonomous Vehicles

TL;DR

This work evaluates a compact hybrid quantum architecture for short-horizon, multi-modal trajectory forecasting in autonomous driving, operating in an ego-centric lane frame and predicting residuals over a kinematic baseline. It combines a 9-qubit quantum attention encoder, a deep but lightweight quantum feedforward network, and a Fourier-based quantum decoder that generates 16 trajectory hypotheses in a single pass, with spectrum-based confidences guiding ranking. Training employs gradient-free SPSA and a min-over-modes loss to achieve stable optimization and meaningful multi-modal forecasts on the Waymo Open Motion Dataset, outperforming a strong lane-following baseline. While not claiming quantum advantage, the study demonstrates that very small quantum circuits can be integrated into a resource-efficient forecasting pipeline, delivering meter-scale errors and diverse futures that are useful for real-time decision making. The results suggest a feasible path to leveraging quantum sub-modules in robotics and autonomous systems under tight compute budgets, with potential extensions to richer context and hardware deployment.

Abstract

Trajectory forecasting for autonomous driving must deliver accurate, calibrated multi-modal futures under tight compute and latency constraints. We propose a compact hybrid quantum architecture that aligns quantum inductive bias with road-scene structure by operating in an ego-centric, lane-aligned frame and predicting residual corrections to a kinematic baseline instead of absolute poses. The model combines a transformer-inspired quantum attention encoder (9 qubits), a parameter-lean quantum feedforward stack (64 layers, trainable angles), and a Fourier-based decoder that uses shallow entanglement and phase superposition to generate 16 trajectory hypotheses in a single pass, with mode confidences derived from the latent spectrum. All circuit parameters are trained with Simultaneous Perturbation Stochastic Approximation (SPSA), avoiding backpropagation through non-analytic components. In the Waymo Open Motion Dataset, the model achieves minADE (minimum Average Displacement Error) of \SI{1.94}{m} and minFDE (minimum Final Displacement Error) of \SI{3.56}{m} in the models predicted over the horizon of \SI{2.0}{s}, consistently outperforming a kinematic baseline with reduced miss rates and strong recall. Ablations confirm that residual learning in the lane frame, truncated Fourier decoding, shallow entanglement, and spectrum-based ranking focus capacity where it matters, yielding stable optimization and reliable multi-modal forecasts from small, shallow quantum circuits on a modern autonomous-driving benchmark.

Paper Structure

This paper contains 39 sections, 13 equations, 9 figures, 5 algorithms.

Figures (9)

  • Figure 1: Overview of the proposed quantum multi-modal trajectory prediction pipeline. SDV history is preprocessed into an ego-centric lane frame and split into a kinematic baseline and query-key-value features, which are encoded by a quantum attention encoder and a deep quantum feedforward stack to produce a latent vector. A quantum decoder, Fourier trajectory head, and confidence head generate multi-modal residual trajectories and mode probabilities that are added to the baseline to obtain final predictions. Solid arrows denote forward data flow, while dashed arrows show construction of residual targets from ground truth and the SPSA-based parameter updates used for training.
  • Figure 2: Global and step-level training signals. (a) Training and validation losses decrease smoothly and track closely, indicating good generalization under residual prediction and bounded-angle normalization. (b) Raw and smoothed loss show steady improvement with gentle undulations aligned with SPSA schedule resets. (c) Step-wise loss traces confirm the same trend at finer granularity.
  • Figure 3: Fine-grained learning signals and discrete convergence steps. (a) Finite-difference loss rate oscillates around a negative mean; sign flips reflect mode switches under the $\min$-over-modes objective while the global trend decreases. (b) Step-level ADE/FDE amplitudes contract over training, confining volatility to a shrinking subset of scenes. (c) Best-so-far validation ADE forms a staircase with drops at epochs $\sim$6, 18, 35, 52, 74, and 83, consistent with phase realignments between the shallow attention/decoder and the Fourier readout.
  • Figure 4: Convergence and stability diagnostics under SPSA. (a) Rolling ADE standard deviation drops by roughly an order of magnitude from early to late training. (b) Epoch-to-epoch improvement rate tapers after $\sim$70 epochs, indicating approach to a stable optimum. (c) Error variance settles into a narrow band. Together these trends indicate stable optimization and predictable convergence for the shallow 9-qubit circuits.
  • Figure 5: Accuracy overview across training. (a) ADE/FDE drop quickly in the first 10–15 epochs then taper, indicating fast capture of short-horizon kinematics with later refinement. (b) Miss@2m and Miss@4m decrease and stabilize. (c) Percentile ADE (P50–P99) shows substantial shrinkage of the bulk and mid-tail, with little change in the extreme tail. (d) P90–P100 and P95–P100 bands narrow steadily, indicating improved error concentration. (e) ADE vs. lane/CTRV baseline: the model opens and sustains a gap over the kinematic prior. (f) Horizon-wise error curves over 0s–2.0s show a consistent advantage across time.
  • ...and 4 more figures