Table of Contents
Fetching ...

ChatDyn: Language-Driven Multi-Actor Dynamics Generation in Street Scenes

Yuxi Wei, Jingbo Wang, Yuwen Du, Dingju Wang, Liang Pan, Chenxin Xu, Yao Feng, Bo Dai, Siheng Chen

TL;DR

ChatDyn tackles the problem of generating realistic, instruction-driven dynamics for multiple street participants by integrating language-based high-level planning with physics-based low-level control. It employs a two-stage architecture: multi-LLM-agent role-playing for planning and PedExecutor/VehExecutor policies for execution. The paper contributes a unified PedExecutor capable of handling diverse pedestrian tasks with hierarchical priors and body-masked AMP, a VehExecutor that enforces physical consistency via a bicycle model and history-aware state, and comprehensive experiments showing improved realism, interaction fidelity, and controllability over prior work. This framework enables more realistic and controllable street scene simulations for training and evaluating autonomous driving systems.

Abstract

Generating realistic and interactive dynamics of traffic participants according to specific instruction is critical for street scene simulation. However, there is currently a lack of a comprehensive method that generates realistic dynamics of different types of participants including vehicles and pedestrians, with different kinds of interactions between them. In this paper, we introduce ChatDyn, the first system capable of generating interactive, controllable and realistic participant dynamics in street scenes based on language instructions. To achieve precise control through complex language, ChatDyn employs a multi-LLM-agent role-playing approach, which utilizes natural language inputs to plan the trajectories and behaviors for different traffic participants. To generate realistic fine-grained dynamics based on the planning, ChatDyn designs two novel executors: the PedExecutor, a unified multi-task executor that generates realistic pedestrian dynamics under different task plannings; and the VehExecutor, a physical transition-based policy that generates physically plausible vehicle dynamics. Extensive experiments show that ChatDyn can generate realistic driving scene dynamics with multiple vehicles and pedestrians, and significantly outperforms previous methods on subtasks. Code and model will be available at https://vfishc.github.io/chatdyn.

ChatDyn: Language-Driven Multi-Actor Dynamics Generation in Street Scenes

TL;DR

ChatDyn tackles the problem of generating realistic, instruction-driven dynamics for multiple street participants by integrating language-based high-level planning with physics-based low-level control. It employs a two-stage architecture: multi-LLM-agent role-playing for planning and PedExecutor/VehExecutor policies for execution. The paper contributes a unified PedExecutor capable of handling diverse pedestrian tasks with hierarchical priors and body-masked AMP, a VehExecutor that enforces physical consistency via a bicycle model and history-aware state, and comprehensive experiments showing improved realism, interaction fidelity, and controllability over prior work. This framework enables more realistic and controllable street scene simulations for training and evaluating autonomous driving systems.

Abstract

Generating realistic and interactive dynamics of traffic participants according to specific instruction is critical for street scene simulation. However, there is currently a lack of a comprehensive method that generates realistic dynamics of different types of participants including vehicles and pedestrians, with different kinds of interactions between them. In this paper, we introduce ChatDyn, the first system capable of generating interactive, controllable and realistic participant dynamics in street scenes based on language instructions. To achieve precise control through complex language, ChatDyn employs a multi-LLM-agent role-playing approach, which utilizes natural language inputs to plan the trajectories and behaviors for different traffic participants. To generate realistic fine-grained dynamics based on the planning, ChatDyn designs two novel executors: the PedExecutor, a unified multi-task executor that generates realistic pedestrian dynamics under different task plannings; and the VehExecutor, a physical transition-based policy that generates physically plausible vehicle dynamics. Extensive experiments show that ChatDyn can generate realistic driving scene dynamics with multiple vehicles and pedestrians, and significantly outperforms previous methods on subtasks. Code and model will be available at https://vfishc.github.io/chatdyn.

Paper Structure

This paper contains 33 sections, 1 equation, 16 figures, 10 tables.

Figures (16)

  • Figure 1: ChatDyn achieves interactive and realistic language-driven multi-actor dynamics generation in street scenes.
  • Figure 2: System overview. ChatDyn adopts multi-LLM-agent role-playing for precise high-level planning. Two specialized executors are designed for realistic low-level generation.
  • Figure 3: Pedestrian executor (PedExecutor) framework. With multi-task unified training, PedExecutor achieves unified control over various task, including following, imitation and interaction. It generates realistic pedestrian dynamics by effectively executing tasks derived from planning. Hierarchical control and AMP with body mask provide prior to action space and reward space, improving the realism of dynamics output.
  • Figure 4: Vehicle executor (VehExecutor) framework. VehExecutor adopts goal-conditioned RL based on physical transition of real vehicle. Combining with history-aware design, VehExecutor generates realistic vehicle dynamics under planned trajectory.
  • Figure 5: System results under complex and composite commands, with diverse interaction information and realistic dynamics output.
  • ...and 11 more figures