Table of Contents
Fetching ...

Learning Safe Autonomous Driving Policies Using Predictive Safety Representations

Mahesh Keswani, Raunak Bhattacharyya

TL;DR

This work assesses the real-world viability of SRPL, a predictive safety representation augmentation for SafeRL in autonomous driving, using WOMD and NuPlan. SRPL integrates a Steps-to-Cost model into the policy input, trained jointly with RL, to improve safety-aware exploration. Across multiple baselines, SRPL enhances the reward-safety tradeoff, improves robustness to sensor noise, and demonstrates asymmetrical cross-dataset generalization favoring more diverse training data. The findings provide practical guidance on algorithm choices for SRPL-enabled SafeRL and highlight domain-specific factors that influence effectiveness.

Abstract

Safe reinforcement learning (SafeRL) is a prominent paradigm for autonomous driving, where agents are required to optimize performance under strict safety requirements. This dual objective creates a fundamental tension, as overly conservative policies limit driving efficiency while aggressive exploration risks safety violations. The Safety Representations for Safer Policy Learning (SRPL) framework addresses this challenge by equipping agents with a predictive model of future constraint violations and has shown promise in controlled environments. This paper investigates whether SRPL extends to real-world autonomous driving scenarios. Systematic experiments on the Waymo Open Motion Dataset (WOMD) and NuPlan demonstrate that SRPL can improve the reward-safety tradeoff, achieving statistically significant improvements in success rate (effect sizes r = 0.65-0.86) and cost reduction (effect sizes r = 0.70-0.83), with p < 0.05 for observed improvements. However, its effectiveness depends on the underlying policy optimizer and the dataset distribution. The results further show that predictive safety representations play a critical role in improving robustness to observation noise. Additionally, in zero-shot cross-dataset evaluation, SRPL-augmented agents demonstrate improved generalization compared to non-SRPL methods. These findings collectively demonstrate the potential of predictive safety representations to strengthen SafeRL for autonomous driving.

Learning Safe Autonomous Driving Policies Using Predictive Safety Representations

TL;DR

This work assesses the real-world viability of SRPL, a predictive safety representation augmentation for SafeRL in autonomous driving, using WOMD and NuPlan. SRPL integrates a Steps-to-Cost model into the policy input, trained jointly with RL, to improve safety-aware exploration. Across multiple baselines, SRPL enhances the reward-safety tradeoff, improves robustness to sensor noise, and demonstrates asymmetrical cross-dataset generalization favoring more diverse training data. The findings provide practical guidance on algorithm choices for SRPL-enabled SafeRL and highlight domain-specific factors that influence effectiveness.

Abstract

Safe reinforcement learning (SafeRL) is a prominent paradigm for autonomous driving, where agents are required to optimize performance under strict safety requirements. This dual objective creates a fundamental tension, as overly conservative policies limit driving efficiency while aggressive exploration risks safety violations. The Safety Representations for Safer Policy Learning (SRPL) framework addresses this challenge by equipping agents with a predictive model of future constraint violations and has shown promise in controlled environments. This paper investigates whether SRPL extends to real-world autonomous driving scenarios. Systematic experiments on the Waymo Open Motion Dataset (WOMD) and NuPlan demonstrate that SRPL can improve the reward-safety tradeoff, achieving statistically significant improvements in success rate (effect sizes r = 0.65-0.86) and cost reduction (effect sizes r = 0.70-0.83), with p < 0.05 for observed improvements. However, its effectiveness depends on the underlying policy optimizer and the dataset distribution. The results further show that predictive safety representations play a critical role in improving robustness to observation noise. Additionally, in zero-shot cross-dataset evaluation, SRPL-augmented agents demonstrate improved generalization compared to non-SRPL methods. These findings collectively demonstrate the potential of predictive safety representations to strengthen SafeRL for autonomous driving.

Paper Structure

This paper contains 16 sections, 5 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Integration of SRPL with SafeRL algorithms: Raw Waymo Open Motion Dataset (WOMD) scenarios are converted to MetaDrive-compatible format using ScenarioNet, generating bird's-eye view driving environments. The baseline SafeRL approach (blue) uses raw state observations as direct input to policy $\pi_\theta$. The SRPL framework (green) augments decision-making by concatenating raw states with predictive safety information from the Steps-to-Cost model $S_\nu(s)$ before policy execution. Pink arrows indicate the ego-agent's position in each scenario.
  • Figure 2: Comparison of training performance on WOMD between baseline SafeRL algorithms and their SRPL-augmented counterparts. The left plots show average returns (higher is better), while the right plots show average costs (lower is better).
  • Figure 3: Average reward (left) and cost (right) versus Gaussian noise level ($\sigma$) applied to Lidar observations, evaluated on 500 WOMD scenarios per noise level. Higher rewards and lower costs indicate better performance.
  • Figure 4: Comparison of baseline and SRPL-augmented mean action outputs for different noise levels.