Table of Contents
Fetching ...

Perception Without Vision for Trajectory Prediction: Ego Vehicle Dynamics as Scene Representation for Efficient Active Learning in Autonomous Driving

Ross Greer, Mohan Trivedi

TL;DR

The paper addresses data efficiency for autonomous driving trajectory prediction by leveraging ego-vehicle trajectory and dynamic state information as a low-cost proxy for scene awareness. It introduces a novelty-sensitive active learning framework that clusters trajectory-states and uses budget-aware sampling parameters ($α$ and $β$) to selectively annotate data without relying on model uncertainty. By defining a trajectory-state similarity metric and employing hierarchical clustering, the approach identifies novel and familiar data clusters to guide data acquisition, achieving consistent gains over random sampling on the nuScenes dataset and evidencing a data-typicality phase transition. The work demonstrates that beginning with typical data and gradually increasing novelty yields efficient learning, with potential extensions to object detection and path planning, thereby enabling safer, more data-efficient autonomous driving systems.

Abstract

This study investigates the use of trajectory and dynamic state information for efficient data curation in autonomous driving machine learning tasks. We propose methods for clustering trajectory-states and sampling strategies in an active learning framework, aiming to reduce annotation and data costs while maintaining model performance. Our approach leverages trajectory information to guide data selection, promoting diversity in the training data. We demonstrate the effectiveness of our methods on the trajectory prediction task using the nuScenes dataset, showing consistent performance gains over random sampling across different data pool sizes, and even reaching sub-baseline displacement errors at just 50% of the data cost. Our results suggest that sampling typical data initially helps overcome the ''cold start problem,'' while introducing novelty becomes more beneficial as the training pool size increases. By integrating trajectory-state-informed active learning, we demonstrate that more efficient and robust autonomous driving systems are possible and practical using low-cost data curation strategies.

Perception Without Vision for Trajectory Prediction: Ego Vehicle Dynamics as Scene Representation for Efficient Active Learning in Autonomous Driving

TL;DR

The paper addresses data efficiency for autonomous driving trajectory prediction by leveraging ego-vehicle trajectory and dynamic state information as a low-cost proxy for scene awareness. It introduces a novelty-sensitive active learning framework that clusters trajectory-states and uses budget-aware sampling parameters ( and ) to selectively annotate data without relying on model uncertainty. By defining a trajectory-state similarity metric and employing hierarchical clustering, the approach identifies novel and familiar data clusters to guide data acquisition, achieving consistent gains over random sampling on the nuScenes dataset and evidencing a data-typicality phase transition. The work demonstrates that beginning with typical data and gradually increasing novelty yields efficient learning, with potential extensions to object detection and path planning, thereby enabling safer, more data-efficient autonomous driving systems.

Abstract

This study investigates the use of trajectory and dynamic state information for efficient data curation in autonomous driving machine learning tasks. We propose methods for clustering trajectory-states and sampling strategies in an active learning framework, aiming to reduce annotation and data costs while maintaining model performance. Our approach leverages trajectory information to guide data selection, promoting diversity in the training data. We demonstrate the effectiveness of our methods on the trajectory prediction task using the nuScenes dataset, showing consistent performance gains over random sampling across different data pool sizes, and even reaching sub-baseline displacement errors at just 50% of the data cost. Our results suggest that sampling typical data initially helps overcome the ''cold start problem,'' while introducing novelty becomes more beneficial as the training pool size increases. By integrating trajectory-state-informed active learning, we demonstrate that more efficient and robust autonomous driving systems are possible and practical using low-cost data curation strategies.
Paper Structure (11 sections, 1 equation, 9 figures, 1 table, 1 algorithm)

This paper contains 11 sections, 1 equation, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: In supervised learning for tasks such as trajectory prediction, data is collected (yellow), annotated and added to a training pool (blue), and then a model is trained (purple). When more data is collected than can be afforded by an annotation or computational budget, intelligent sampling using active learning (white) may provide solutions which maintain model performance at reduced data cost. We contribute algorithms for clustering of trajectory-states and sampling strategies which are model-agnostic, providing a benefit of active learning based only on the current training data and without requiring computation of uncertainty from the partially-trained model.
  • Figure 2: We randomly select 12 clusters, formed using our distance measurement over trajectory-states (which include trajectory coordinates and vehicle dynamics). Comparing across the selected clusters, clear patterns emerge even over the 2D coordinates alone (visualized), showing the effectiveness of grouping like-trajectories.
  • Figure 3: Many trajectory-states remain unclustered due to sufficient distance from all nearest trajectory-state clusters. We randomly sample just 20 of these unmatched trajectory-states, visualizing the 2D path coordinates and illustrating the diversity of behaviors found to be unique within the dataset.
  • Figure 4: These five graphs represent the minimum average displacement error metric ($mADE_5$) performance of various parameterizations of the active learning strategy over a random baseline, considering the 5 most likely trajectory predictions from the model. Positive numbers indicate improvement over random. From left to right, each graph has a different training pool size, with the amount of data in the training pool increases from 10% to 50% of nuScenes (in 10% increments). The y-axis represents improvement over random, while the x-axis represents the allowable "depth" into a cluster that the algorithm samples. Each color line represents a different proportion of unique (novel, diverse) data, versus resampling data which is similar (typical) to data which already exists in the training pool. The point that we seek to highlight is the change in position of the yellow line (all novel data) and the red line (all typical data). We see that as the annotation budget or training pool size increases, these two trends effectively switch roles in over- (or under-) performing relative to the random baseline. This pattern matches the findings of Guy et al. in image classification tasks, providing evidence for the presence of the active learning phase transition within the trajectory prediction task - and, within the bounds of the nuScenes dataset size. Sampling typical data helps in overcoming a cold start, while novel data should be sampled in higher proportion as the training pool grows.
  • Figure 5: These five graphs represent the minimum average displacement error metric ($mADE_{10}$) performance of various parameterizations of the active learning strategy over a random baseline, considering the 10 most likely trajectory predictions from the model. Positive numbers indicate improvement over random. From left to right, each graph has a different training pool size, with the amount of data in the training pool increases from 10% to 50% of nuScenes (in 10% increments). The y-axis represents improvement over random, while the x-axis represents the allowable "depth" into a cluster that the algorithm samples. Each color line represents a different proportion of unique (novel, diverse) data, versus resampling data which is similar (typical) to data which already exists in the training pool. We observe the same pattern as noted in the graphs of $mADE_5$, in the transposition of performance of the strategy which samples novel data and the strategy which samples typical data.
  • ...and 4 more figures