Table of Contents
Fetching ...

Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models

Zhejun Zhang, Peter Karkus, Maximilian Igl, Wenhao Ding, Yuxiao Chen, Boris Ivanovic, Marco Pavone

TL;DR

This work tackles covariate shift and multimodality in tokenized, multi-agent traffic policies when moving from open-loop training to closed-loop evaluation. It introduces Closest Among Top-K (CAT-K) rollouts, which greedily select among the top-$K$ policy actions by minimizing the distance to ground-truth next states during fine-tuning, enabling closed-loop supervised training without reinforcement learning. A two-stage pipeline of behavior cloning (BC) pre-training followed by CAT-K closed-loop fine-tuning yields a compact 7M-parameter SMART policy that outperforms a 102M-parameter baseline and achieves state-of-the-art performance on the Waymo Open Sim Agent Challenge. The approach also improves a Gaussian Mixture Model ego-policy in an ego-vehicle task, indicating broad applicability to multimodal imitation learning across discrete token and continuous action spaces.

Abstract

Traffic simulation aims to learn a policy for traffic agents that, when unrolled in closed-loop, faithfully recovers the joint distribution of trajectories observed in the real world. Inspired by large language models, tokenized multi-agent policies have recently become the state-of-the-art in traffic simulation. However, they are typically trained through open-loop behavior cloning, and thus suffer from covariate shift when executed in closed-loop during simulation. In this work, we present Closest Among Top-K (CAT-K) rollouts, a simple yet effective closed-loop fine-tuning strategy to mitigate covariate shift. CAT-K fine-tuning only requires existing trajectory data, without reinforcement learning or generative adversarial imitation. Concretely, CAT-K fine-tuning enables a small 7M-parameter tokenized traffic simulation policy to outperform a 102M-parameter model from the same model family, achieving the top spot on the Waymo Sim Agent Challenge leaderboard at the time of submission. The code is available at https://github.com/NVlabs/catk.

Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models

TL;DR

This work tackles covariate shift and multimodality in tokenized, multi-agent traffic policies when moving from open-loop training to closed-loop evaluation. It introduces Closest Among Top-K (CAT-K) rollouts, which greedily select among the top- policy actions by minimizing the distance to ground-truth next states during fine-tuning, enabling closed-loop supervised training without reinforcement learning. A two-stage pipeline of behavior cloning (BC) pre-training followed by CAT-K closed-loop fine-tuning yields a compact 7M-parameter SMART policy that outperforms a 102M-parameter baseline and achieves state-of-the-art performance on the Waymo Open Sim Agent Challenge. The approach also improves a Gaussian Mixture Model ego-policy in an ego-vehicle task, indicating broad applicability to multimodal imitation learning across discrete token and continuous action spaces.

Abstract

Traffic simulation aims to learn a policy for traffic agents that, when unrolled in closed-loop, faithfully recovers the joint distribution of trajectories observed in the real world. Inspired by large language models, tokenized multi-agent policies have recently become the state-of-the-art in traffic simulation. However, they are typically trained through open-loop behavior cloning, and thus suffer from covariate shift when executed in closed-loop during simulation. In this work, we present Closest Among Top-K (CAT-K) rollouts, a simple yet effective closed-loop fine-tuning strategy to mitigate covariate shift. CAT-K fine-tuning only requires existing trajectory data, without reinforcement learning or generative adversarial imitation. Concretely, CAT-K fine-tuning enables a small 7M-parameter tokenized traffic simulation policy to outperform a 102M-parameter model from the same model family, achieving the top spot on the Waymo Sim Agent Challenge leaderboard at the time of submission. The code is available at https://github.com/NVlabs/catk.

Paper Structure

This paper contains 27 sections, 6 equations, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: Closest Among Top-K (CAT-K) rollouts. The key idea of our approach is to unroll the policy during fine-tuning in a way that visited states remain close to the GT. At each time step, CAT-K first takes the top-K most likely action tokens according to the policy, then chooses the one leading to the state closest to the GT. As a result, CAT-K rollouts follow the mode of the GT (e.g., turning left), while random or top-K rollouts can lead to large deviations (e.g., going straight or right). Since the policy is essentially trained to minimize the distance between the rollout states and the GT states, the GT-based supervision remains effective for CAT-K rollouts, but not for random or top-K rollouts.
  • Figure 2: Schematic comparison of CAT-K rollout, top-K sampling, and data augmentation techniques of Trajeglish and SMART. In this example, the token vocabulary has a size of 5. We rollout three steps from $t=0$ to $t=3$. For CAT-K rollout and top-K sampling, the top-K is w.r.t the probabilities $p$ of tokens predicted by the policy. For the data augmentations used by Trajeglish and SMART, the policy is unavailable, and the top-K selection is based on the negative distances between tokens and GT.
  • Figure 3: Influence of $K_\text{infer}$ for inference-time top-K sampling.
  • Figure 4: On server vs. local evaluation of SMART-tiny.
  • Figure 5: ADE between CAT-K rollouts and GT trajectories.
  • ...and 1 more figures