Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving

Zhenghao Peng; Wenjie Luo; Yiren Lu; Tianyi Shen; Cole Gulino; Ari Seff; Justin Fu

Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving

Zhenghao Peng, Wenjie Luo, Yiren Lu, Tianyi Shen, Cole Gulino, Ari Seff, Justin Fu

TL;DR

The paper tackles the challenge of reliably modeling autonomous driving agent behaviors under distribution shift by applying a pre-training plus reinforcement-learning fine-tuning paradigm to a Transformer-based autoregressive motion predictor. Using MotionLM as the base model, it performs on-policy RL fine-tuning with a simple reward that balances trajectory realism and collision avoidance, evaluated on the WOMD/WOSAC benchmarks. The authors also introduce a novel policy-evaluation framework that measures how well simulators rank and evaluate autonomous planners, demonstrating that fine-tuned sim agents yield more faithful planner assessments. While results show clear gains in safety-critical metrics and planner evaluation reliability, limitations include a simplified dynamics model and rewards, pointing to future work on integrating realistic low-level control and broader reward shaping.

Abstract

A major challenge in autonomous vehicle research is modeling agent behaviors, which has critical applications including constructing realistic and reliable simulations for off-board evaluation and forecasting traffic agents motion for onboard planning. While supervised learning has shown success in modeling agents across various domains, these models can suffer from distribution shift when deployed at test-time. In this work, we improve the reliability of agent behaviors by closed-loop fine-tuning of behavior models with reinforcement learning. Our method demonstrates improved overall performance, as well as improved targeted metrics such as collision rate, on the Waymo Open Sim Agents challenge. Additionally, we present a novel policy evaluation benchmark to directly assess the ability of simulated agents to measure the quality of autonomous vehicle planners and demonstrate the effectiveness of our approach on this new benchmark.

Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving

TL;DR

Abstract

Paper Structure (24 sections, 5 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 24 sections, 5 equations, 5 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Pre-training and Fine-tuning of Transformer-based Models
Behavior Modeling for Autonomous Driving
Preliminaries
Motion Prediction.
Behavior Modeling as a Multi-Agent RL Problem.
Autoregressive Encoder-Decoder Architecture.
Method
RL Fine-tuning
Policy Evaluation for Sim Agents
Choice of policies.
Reward Function.
Experiments
Dataset.
...and 9 more sections

Figures (5)

Figure 1: We propose to fine-tune a pre-trained motion prediction model with closed-loop reinforcement learning.
Figure 2: Left: The agent is trained from scratch using a combined Behavioral Cloning (BC) and Reinforcement Learning (RL) approach. Without pre-training on large datasets, the agent must simultaneously explore the environment and develop its capabilities from scratch. Right: The agent undergoes a two-phase training scheme. Agent acquires a foundational skill set from aligning its actions (green) with ground truth data in pre-training (gray). The fine-tuning through RL refines the agent behaviors in the autoregressive rollout.
Figure 3: The causal mask in the decoder.
Figure 4: We evaluate the sim agent by its ability to correctly assess the AD planner.
Figure 5: Visualization of scenario rollouts using a pre-trained and a fine-tuned model. The start locations of vehicles are marked with a red star, the ground truth futures are marked with a solid black line, and the sampled trajectory is marked with circles of different colors. Left: The pre-trained model suffers from drifting due to distributional shift between training (with teacher forcing) and testing (with an autoregressive rollout). Right: The fine-tuned model is able to follow the ground truth much more precisely, which is quantitatively demonstrated by the better ADE metric.

Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving

TL;DR

Abstract

Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving

Authors

TL;DR

Abstract

Table of Contents

Figures (5)