Game-Theoretic Risk-Shaped Reinforcement Learning for Safe Autonomous Driving
Dong Hu, Fenqing Hu, Lidong Yang, Chao Huang
TL;DR
This work tackles safe autonomous driving by addressing the limits of reward-focused RL in dynamic, multi-agent traffic. It introduces GTR2L, a game-theoretic risk-shaped RL framework that fuses a multi-level game-theoretic world model with an adaptive prediction horizon, uncertainty-aware reachability via a barrier mechanism, and risk-constrained policy optimization under a CMDP formulation with a long-term safety bound $f(s^m,a^m) \,\le\, f_0$. The approach leverages an ensemble-based model to capture both epistemic and aleatoric uncertainty and integrates K-level reasoning to anticipate interactions across agents, enabling proactive risk avoidance. Empirical results in SUMO and CARLA show that GTR2L achieves higher success rates, lower collision and violation rates, and better driving efficiency and comfort compared to strong baselines and human drivers, validating the efficacy of the proposed risk-aware planning paradigm. The work advances safe AD by combining game-theoretic interaction modeling, adaptive horizon planning, and principled risk constraints, with implications for robust decision-making under uncertainty in real-world driving.
Abstract
Ensuring safety in autonomous driving (AD) remains a significant challenge, especially in highly dynamic and complex traffic environments where diverse agents interact and unexpected hazards frequently emerge. Traditional reinforcement learning (RL) methods often struggle to balance safety, efficiency, and adaptability, as they primarily focus on reward maximization without explicitly modeling risk or safety constraints. To address these limitations, this study proposes a novel game-theoretic risk-shaped RL (GTR2L) framework for safe AD. GTR2L incorporates a multi-level game-theoretic world model that jointly predicts the interactive behaviors of surrounding vehicles and their associated risks, along with an adaptive rollout horizon that adjusts dynamically based on predictive uncertainty. Furthermore, an uncertainty-aware barrier mechanism enables flexible modulation of safety boundaries. A dedicated risk modeling approach is also proposed, explicitly capturing both epistemic and aleatoric uncertainty to guide constrained policy optimization and enhance decision-making in complex environments. Extensive evaluations across diverse and safety-critical traffic scenarios show that GTR2L significantly outperforms state-of-the-art baselines, including human drivers, in terms of success rate, collision and violation reduction, and driving efficiency. The code is available at https://github.com/DanielHu197/GTR2L.
