BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay

Catherine Weaver; Chen Tang; Ce Hao; Kenta Kawamoto; Masayoshi Tomizuka; Wei Zhan

BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay

Catherine Weaver, Chen Tang, Ce Hao, Kenta Kawamoto, Masayoshi Tomizuka, Wei Zhan

TL;DR

BeTAIL addresses imitation learning for autonomous racing where human decisions are complex and non-Markovian. It pretrains a Behavior Transformer on offline demonstrations and then refines it online with a residual Adversarial Imitation Learning policy to correct distribution shifts. Across Lago Maggiore, Dragon Tail, and Mount Panorama, BeTAIL achieves faster, smoother laps and better stability than BeT or AIL alone, including transfer to unseen tracks. The approach demonstrates that combining sequence modeling with occupancy-matching imitation can robustly recover non-Markovian human-like decision-making in a high-fidelity racing simulator.

Abstract

Imitation learning learns a policy from demonstrations without requiring hand-designed reward functions. In many robotic tasks, such as autonomous racing, imitated policies must model complex environment dynamics and human decision-making. Sequence modeling is highly effective in capturing intricate patterns of motion sequences but struggles to adapt to new environments or distribution shifts that are common in real-world robotics tasks. In contrast, Adversarial Imitation Learning (AIL) can mitigate this effect, but struggles with sample inefficiency and handling complex motion patterns. Thus, we propose BeTAIL: Behavior Transformer Adversarial Imitation Learning, which combines a Behavior Transformer (BeT) policy from human demonstrations with online AIL. BeTAIL adds an AIL residual policy to the BeT policy to model the sequential decision-making process of human experts and correct for out-of-distribution states or shifts in environment dynamics. We test BeTAIL on three challenges with expert-level demonstrations of real human gameplay in Gran Turismo Sport. Our proposed residual BeTAIL reduces environment interactions and improves racing performance and stability, even when the BeT is pretrained on different tracks than downstream learning. Videos and code available at: https://sites.google.com/berkeley.edu/BeTAIL/home.

BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay

TL;DR

Abstract

Paper Structure (24 sections, 9 equations, 4 figures, 1 table)

This paper contains 24 sections, 9 equations, 4 figures, 1 table.

Introduction
Related Works
Behavior Modeling
Curriculum Learning and Guided Learning
Sequence Modeling
Preliminaries
Problem Statement
Unimodal Decision Transformer
Behavior Transformer-Assisted Adversarial Imitation Learning
Behavior Transformer (BeT) Pretraining
Residual Policy Learning for Online Fine-tuning
Residual Policy Training with AIL
Imitation of Human Racing Gameplay
State Feature Extraction and Actions
Environment and Data Collection
...and 9 more sections

Figures (4)

Figure 1: BeTAIL rollout collection. The pre-trained BeT predicts action $\hat{a}_t$ from the last $H$ state-actions. Then the residual policy specifies action $\tilde{a}_t$ from the current state and $\hat{a}_t$, and the agent executes $a_t=\hat{a}_t+\tilde{a}_t$ in the environment.
Figure 2: Agent trajectories on Lago Maggiore. We deliberately set AIL and BeTAIL to start at a lower initial speed than the human. Car drawing is placed at the vehicle's location and heading every 0.4s. See website for the animated version.
Figure 3: Experimental results on three racing challenges. (a) Lago Maggiore challenges pretrains the BeT on the same demonstrations and downstream environments. (b) Dragon Tail transfers the BeT policy to a new track with BeTAIL finetuning. (c) The Mount Panorama challenge pretrains the BeT on a library of 4 tracks, and BeTAIL finetunes on an unseen track. (d)-(f) evaluation of mean (std) success rate to finish laps and mean (std) of lap times. (g)-(i) Best policy's mean $\pm$ std lap time and change in steering from previous time step. (8M steps $\approx$ 25 hours w/ 20 cars collecting data)
Figure 4: Ablation study on Lago Maggiore (Fig. \ref{['fig:laptime']}a). SAC trains a Markov policy, replacing the AIL reward with the reward in \ref{['eq:rl_reward']}. BeTSAC(0.05) replaces the AIL residual policy finetuning step with SAC finetuning using \ref{['eq:rl_reward']}. 3 seeds unless noted.

BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay

TL;DR

Abstract

BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay

Authors

TL;DR

Abstract

Table of Contents

Figures (4)