A Champion-level Vision-based Reinforcement Learning Agent for Competitive Racing in Gran Turismo 7
Hojoon Lee, Takuma Seno, Jun Jet Tai, Kaushik Subramanian, Kenta Kawamoto, Peter Stone, Peter R. Wurman
TL;DR
This work addresses the gap in real-world applicability of deep RL for autonomous racing by presenting a vision-based agent that operates using ego-centric camera data and onboard sensors, without inference-time global localization. It introduces an asymmetric actor-critic architecture trained with QR-SAC, where the actor uses local vision and proprioception while the critic leverages global features during training to improve policy quality. The agent achieves champion-level performance against GT7's built-in AI across three tracks, outperforming human champions in several scenarios and surpassing GT Sophy in others, with ablations confirming the importance of memory and global-information utilization during training. The results highlight the potential of vision-based reinforcement learning for high-speed, multi-agent racing and pave the way for practical deployment with reduced dependence on external localization and instrumentation.
Abstract
Deep reinforcement learning has achieved superhuman racing performance in high-fidelity simulators like Gran Turismo 7 (GT7). It typically utilizes global features that require instrumentation external to a car, such as precise localization of agents and opponents, limiting real-world applicability. To address this limitation, we introduce a vision-based autonomous racing agent that relies solely on ego-centric camera views and onboard sensor data, eliminating the need for precise localization during inference. This agent employs an asymmetric actor-critic framework: the actor uses a recurrent neural network with the sensor data local to the car to retain track layouts and opponent positions, while the critic accesses the global features during training. Evaluated in GT7, our agent consistently outperforms GT7's built-drivers. To our knowledge, this work presents the first vision-based autonomous racing agent to demonstrate champion-level performance in competitive racing scenarios.
