Table of Contents
Fetching ...

EasyChauffeur: A Baseline Advancing Simplicity and Efficiency on Waymax

Lingyu Xiao, Jiang-Jiang Liu, Xiaoqing Ye, Wankou Yang, Jingdong Wang

TL;DR

EasyChauffeur argues for shifting focus from architecture-centric improvements to training strategy, data efficiency, and robust evaluation in autonomous driving planning. It shows that on-policy RL (PPO) can achieve strong performance with only a small fraction of data, and that SNE-Sampling enhances data efficiency by selecting representative latent-space samples. The paper also introduces Ego-Shifting to reveal robustness gaps in close-loop evaluation and demonstrates RL's superior robustness under these perturbations. Together, these contributions suggest a holistic approach—combining data-aware training, efficient data selection, and robust evaluation—to advance practical driving planners on GPU-accelerated simulators like Waymax and WOMD.

Abstract

Recent advancements in deep-learning-based driving planners have primarily focused on elaborate network engineering, yielding limited improvements. This paper diverges from conventional approaches by exploring three fundamental yet underinvestigated aspects: training policy, data efficiency, and evaluation robustness. We introduce EasyChauffeur, a reproducible and effective planner for both imitation learning (IL) and reinforcement learning (RL) on Waymax, a GPU-accelerated simulator. Notably, our findings indicate that the incorporation of on-policy RL significantly boosts performance and data efficiency. To further enhance this efficiency, we propose SNE-Sampling, a novel method that selectively samples data from the encoder's latent space, substantially improving EasyChauffeur's performance with RL. Additionally, we identify a deficiency in current evaluation methods, which fail to accurately assess the robustness of different planners due to significant performance drops from minor changes in the ego vehicle's initial state. In response, we propose Ego-Shifting, a new evaluation setting for assessing planners' robustness. Our findings advocate for a shift from a primary focus on network architectures to adopting a holistic approach encompassing training strategies, data efficiency, and robust evaluation methods.

EasyChauffeur: A Baseline Advancing Simplicity and Efficiency on Waymax

TL;DR

EasyChauffeur argues for shifting focus from architecture-centric improvements to training strategy, data efficiency, and robust evaluation in autonomous driving planning. It shows that on-policy RL (PPO) can achieve strong performance with only a small fraction of data, and that SNE-Sampling enhances data efficiency by selecting representative latent-space samples. The paper also introduces Ego-Shifting to reveal robustness gaps in close-loop evaluation and demonstrates RL's superior robustness under these perturbations. Together, these contributions suggest a holistic approach—combining data-aware training, efficient data selection, and robust evaluation—to advance practical driving planners on GPU-accelerated simulators like Waymax and WOMD.

Abstract

Recent advancements in deep-learning-based driving planners have primarily focused on elaborate network engineering, yielding limited improvements. This paper diverges from conventional approaches by exploring three fundamental yet underinvestigated aspects: training policy, data efficiency, and evaluation robustness. We introduce EasyChauffeur, a reproducible and effective planner for both imitation learning (IL) and reinforcement learning (RL) on Waymax, a GPU-accelerated simulator. Notably, our findings indicate that the incorporation of on-policy RL significantly boosts performance and data efficiency. To further enhance this efficiency, we propose SNE-Sampling, a novel method that selectively samples data from the encoder's latent space, substantially improving EasyChauffeur's performance with RL. Additionally, we identify a deficiency in current evaluation methods, which fail to accurately assess the robustness of different planners due to significant performance drops from minor changes in the ego vehicle's initial state. In response, we propose Ego-Shifting, a new evaluation setting for assessing planners' robustness. Our findings advocate for a shift from a primary focus on network architectures to adopting a holistic approach encompassing training strategies, data efficiency, and robust evaluation methods.
Paper Structure (37 sections, 8 equations, 12 figures, 5 tables)

This paper contains 37 sections, 8 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Comparison of the robustness of EasyChauffeur-IL and EasyChauffeur-PPO to the initial state under close-loop evaluation. 'Tgt. point' stands for the target point. 'ego agent ref.' refers to the initial state of the ego agent used as a reference when evaluated under Ego-Shifting. The current setting for evaluation may not fully assess the planners' robustness, as shown in (a) and (c) where both models arrived successfully. However, when evaluated under Ego-Shifting, EasyChauffeur-IL (b) experiences a collision, while EasyChauffeur-PPO (d) demonstrates strong robustness. More visualisation results can be found on supplementary materials.
  • Figure 2: Overall pipeline of EasyChauffeur.
  • Figure 3: Illustration of SNE-Sampling. Scene encoder is pre-trained from EasyChauffeur.
  • Figure 4: Diagram of Ego-Shifting. We perform a transformation on the ego agent on the x-y plane and the yaw axis; each transformed value is generated through a Gaussian distribution.
  • Figure 5: The performance of EasyChauffeur is evaluated under various training policies, Ego-Shifting properties, and action spaces.
  • ...and 7 more figures