Automated Reward Design for Gran Turismo
Michel Ma, Takuma Seno, Kaushik Subramanian, Peter R. Wurman, Peter Stone, Craig Sherstan
TL;DR
This paper tackles reward design in reinforcement learning for a complex racing simulation by introducing an iterative, LLM-/VLM-assisted framework that converts textual goals into executable reward functions. It replaces costly fitness metrics with learned preferences from vision-language models and uses a trajectory alignment coefficient to prune misaligned rewards, enabling automated search over reward functions. Empirical results show agents competitive with GT Sophy, and the approach yields novel behaviors while remaining generalizable beyond Gran Turismo 7. The work highlights practical gains in automated reward design, while noting computational demands and the continued need for human-in-the-loop guidance for stable performance.
Abstract
When designing reinforcement learning (RL) agents, a designer communicates the desired agent behavior through the definition of reward functions - numerical feedback given to the agent as reward or punishment for its actions. However, mapping desired behaviors to reward functions can be a difficult process, especially in complex environments such as autonomous racing. In this paper, we demonstrate how current foundation models can effectively search over a space of reward functions to produce desirable RL agents for the Gran Turismo 7 racing game, given only text-based instructions. Through a combination of LLM-based reward generation, VLM preference-based evaluation, and human feedback we demonstrate how our system can be used to produce racing agents competitive with GT Sophy, a champion-level RL racing agent, as well as generate novel behaviors, paving the way for practical automated reward design in real world applications.
