Generating and Evolving Reward Functions for Highway Driving with Large Language Models

Xu Han; Qiannan Yang; Xianda Chen; Xiaowen Chu; Meixin Zhu

Generating and Evolving Reward Functions for Highway Driving with Large Language Models

Xu Han, Qiannan Yang, Xianda Chen, Xiaowen Chu, Meixin Zhu

TL;DR

This work addresses the labor-intensive process of reward function design for autonomous highway driving by introducing an LLM-assisted framework that generates and evolves reward-function code. Through an iterative loop of RL training and LLM reflection, the method creates diverse reward candidates, applies them in a fixed DRL setup, and refines them based on performance metrics, leveraging the Bellman-based learning foundation $Q^*(s,a)=\mathbb{E}[R_{t+1}+\gamma\max_{a'}Q^*(s',a')]$. In highway-env simulations across three traffic configurations, the approach surpasses expert handcrafted rewards with an average 22% increase in success rate, indicating enhanced safety and potential productivity gains in reward design. The contributions include a novel LLM-driven reward-generation-and-evolution framework, carefully crafted prompts for complex driving simulations, and empirical evidence of improved performance and generalization over human-designed rewards.

Abstract

Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies by maximizing reward functions to achieve the optimal policy. However, crafting these reward functions has been a complex, manual process in many practices. To reduce this complexity, we introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in autonomous driving. This framework utilizes the coding capabilities of LLMs, proven in other areas, to generate and evolve reward functions for highway scenarios. The framework starts with instructing LLMs to create an initial reward function code based on the driving environment and task descriptions. This code is then refined through iterative cycles involving RL training and LLMs' reflection, which benefits from their ability to review and improve the output. We have also developed a specific prompt template to improve LLMs' understanding of complex driving simulations, ensuring the generation of effective and error-free code. Our experiments in a highway driving simulator across three traffic configurations show that our method surpasses expert handcrafted reward functions, achieving a 22% higher average success rate. This not only indicates safer driving but also suggests significant gains in development productivity.

Generating and Evolving Reward Functions for Highway Driving with Large Language Models

TL;DR

. In highway-env simulations across three traffic configurations, the approach surpasses expert handcrafted rewards with an average 22% increase in success rate, indicating enhanced safety and potential productivity gains in reward design. The contributions include a novel LLM-driven reward-generation-and-evolution framework, carefully crafted prompts for complex driving simulations, and empirical evidence of improved performance and generalization over human-designed rewards.

Abstract

Paper Structure (16 sections, 1 equation, 6 figures, 1 algorithm)

This paper contains 16 sections, 1 equation, 6 figures, 1 algorithm.

INTRODUCTION
RELATED WORKS
Large Language Models for Autonomous Driving
Deep Reinforcement Learning
Reward Engineering for Autonomous Driving
PROPOSED APPROACH
Understanding Driving Simulation Environment
Reinforcement Learning for Highway Driving
Reflection and Refinement
EXPERIMENTS AND RESULTS
Experiment Design
Training Details
Results
CONCLUSION
Initial Prompt
...and 1 more sections

Figures (6)

Figure 1: Conceptual diagram of the proposed framework. LLMs generate reward function codes for driving according to user instructions by using an elaborate prompt template. Then the results of RL training based on the designed reward are fed back to LLMs for reflection and reward regeneration, aiming for evolutionary improvements.
Figure 2: Conversation example between user and LLM. The user prompt includes task description and environment source code, while LLM replies with a reward function.
Figure 3: Example of LLM refining the reward function in an iteration.
Figure 4: Performance of generated and human rewards during RL training.
Figure 5: Success rate comparison with human-designed reward in different types of highway environments.
...and 1 more figures

Generating and Evolving Reward Functions for Highway Driving with Large Language Models

TL;DR

Abstract

Generating and Evolving Reward Functions for Highway Driving with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)