Table of Contents
Fetching ...

Game-Theoretic Risk-Shaped Reinforcement Learning for Safe Autonomous Driving

Dong Hu, Fenqing Hu, Lidong Yang, Chao Huang

TL;DR

This work tackles safe autonomous driving by addressing the limits of reward-focused RL in dynamic, multi-agent traffic. It introduces GTR2L, a game-theoretic risk-shaped RL framework that fuses a multi-level game-theoretic world model with an adaptive prediction horizon, uncertainty-aware reachability via a barrier mechanism, and risk-constrained policy optimization under a CMDP formulation with a long-term safety bound $f(s^m,a^m) \,\le\, f_0$. The approach leverages an ensemble-based model to capture both epistemic and aleatoric uncertainty and integrates K-level reasoning to anticipate interactions across agents, enabling proactive risk avoidance. Empirical results in SUMO and CARLA show that GTR2L achieves higher success rates, lower collision and violation rates, and better driving efficiency and comfort compared to strong baselines and human drivers, validating the efficacy of the proposed risk-aware planning paradigm. The work advances safe AD by combining game-theoretic interaction modeling, adaptive horizon planning, and principled risk constraints, with implications for robust decision-making under uncertainty in real-world driving.

Abstract

Ensuring safety in autonomous driving (AD) remains a significant challenge, especially in highly dynamic and complex traffic environments where diverse agents interact and unexpected hazards frequently emerge. Traditional reinforcement learning (RL) methods often struggle to balance safety, efficiency, and adaptability, as they primarily focus on reward maximization without explicitly modeling risk or safety constraints. To address these limitations, this study proposes a novel game-theoretic risk-shaped RL (GTR2L) framework for safe AD. GTR2L incorporates a multi-level game-theoretic world model that jointly predicts the interactive behaviors of surrounding vehicles and their associated risks, along with an adaptive rollout horizon that adjusts dynamically based on predictive uncertainty. Furthermore, an uncertainty-aware barrier mechanism enables flexible modulation of safety boundaries. A dedicated risk modeling approach is also proposed, explicitly capturing both epistemic and aleatoric uncertainty to guide constrained policy optimization and enhance decision-making in complex environments. Extensive evaluations across diverse and safety-critical traffic scenarios show that GTR2L significantly outperforms state-of-the-art baselines, including human drivers, in terms of success rate, collision and violation reduction, and driving efficiency. The code is available at https://github.com/DanielHu197/GTR2L.

Game-Theoretic Risk-Shaped Reinforcement Learning for Safe Autonomous Driving

TL;DR

This work tackles safe autonomous driving by addressing the limits of reward-focused RL in dynamic, multi-agent traffic. It introduces GTR2L, a game-theoretic risk-shaped RL framework that fuses a multi-level game-theoretic world model with an adaptive prediction horizon, uncertainty-aware reachability via a barrier mechanism, and risk-constrained policy optimization under a CMDP formulation with a long-term safety bound . The approach leverages an ensemble-based model to capture both epistemic and aleatoric uncertainty and integrates K-level reasoning to anticipate interactions across agents, enabling proactive risk avoidance. Empirical results in SUMO and CARLA show that GTR2L achieves higher success rates, lower collision and violation rates, and better driving efficiency and comfort compared to strong baselines and human drivers, validating the efficacy of the proposed risk-aware planning paradigm. The work advances safe AD by combining game-theoretic interaction modeling, adaptive horizon planning, and principled risk constraints, with implications for robust decision-making under uncertainty in real-world driving.

Abstract

Ensuring safety in autonomous driving (AD) remains a significant challenge, especially in highly dynamic and complex traffic environments where diverse agents interact and unexpected hazards frequently emerge. Traditional reinforcement learning (RL) methods often struggle to balance safety, efficiency, and adaptability, as they primarily focus on reward maximization without explicitly modeling risk or safety constraints. To address these limitations, this study proposes a novel game-theoretic risk-shaped RL (GTR2L) framework for safe AD. GTR2L incorporates a multi-level game-theoretic world model that jointly predicts the interactive behaviors of surrounding vehicles and their associated risks, along with an adaptive rollout horizon that adjusts dynamically based on predictive uncertainty. Furthermore, an uncertainty-aware barrier mechanism enables flexible modulation of safety boundaries. A dedicated risk modeling approach is also proposed, explicitly capturing both epistemic and aleatoric uncertainty to guide constrained policy optimization and enhance decision-making in complex environments. Extensive evaluations across diverse and safety-critical traffic scenarios show that GTR2L significantly outperforms state-of-the-art baselines, including human drivers, in terms of success rate, collision and violation reduction, and driving efficiency. The code is available at https://github.com/DanielHu197/GTR2L.

Paper Structure

This paper contains 47 sections, 35 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: The framework of game-theoretic risk-shaped RL for safe autonomous driving, which leverages a game-theoretic adaptive-horizon world model, potential risk modeling, reachability (barrier) modeling, and risk-constrained RL to enhance safety and robustness in dynamic traffic environments.
  • Figure 2: Regular traffic scenarios: (a) unprotected left-turn at an unsignalized intersection (Scenario 1), (b) unprotected straight-crossing at an unsignalized intersection (Scenario 2), (c) consecutive signalized intersections with mixed traffic flows (Scenario 3), and (d) long-distance composite scenario with mixed traffic flows (Scenario 4).
  • Figure 3: Safety-critical scenarios: (a) unprotected left-turn with pedestrian crossing at an urban intersection (Scenario 5), and (b) highway scenario with random emergency events (Scenario 6), where either a sudden emergency stop or a sudden cut-in occurs randomly in each episode.
  • Figure 4: The experimental driving simulator platform is specifically designed for the evaluation of safety-critical scenarios. Within this environment, a human participant operates the ego vehicle through the use of a steering wheel and pedal set. The system comprises a dedicated computing unit and three high-resolution monitors, offering high-fidelity visual feedback to ensure an immersive and realistic driving experience during testing.
  • Figure 5: Training performance of different autonomous driving agents on the Scenario 3 with traffic signals under varying traffic flow densities. Flow-1: (a) Success rate, (b) Collision rate, (c) Traffic-light violation rate; Flow-2: (e) Success rate, (f) Collision rate, (g) Traffic-light violation rate.
  • ...and 4 more figures