ChatDyn: Language-Driven Multi-Actor Dynamics Generation in Street Scenes
Yuxi Wei, Jingbo Wang, Yuwen Du, Dingju Wang, Liang Pan, Chenxin Xu, Yao Feng, Bo Dai, Siheng Chen
TL;DR
ChatDyn tackles the problem of generating realistic, instruction-driven dynamics for multiple street participants by integrating language-based high-level planning with physics-based low-level control. It employs a two-stage architecture: multi-LLM-agent role-playing for planning and PedExecutor/VehExecutor policies for execution. The paper contributes a unified PedExecutor capable of handling diverse pedestrian tasks with hierarchical priors and body-masked AMP, a VehExecutor that enforces physical consistency via a bicycle model and history-aware state, and comprehensive experiments showing improved realism, interaction fidelity, and controllability over prior work. This framework enables more realistic and controllable street scene simulations for training and evaluating autonomous driving systems.
Abstract
Generating realistic and interactive dynamics of traffic participants according to specific instruction is critical for street scene simulation. However, there is currently a lack of a comprehensive method that generates realistic dynamics of different types of participants including vehicles and pedestrians, with different kinds of interactions between them. In this paper, we introduce ChatDyn, the first system capable of generating interactive, controllable and realistic participant dynamics in street scenes based on language instructions. To achieve precise control through complex language, ChatDyn employs a multi-LLM-agent role-playing approach, which utilizes natural language inputs to plan the trajectories and behaviors for different traffic participants. To generate realistic fine-grained dynamics based on the planning, ChatDyn designs two novel executors: the PedExecutor, a unified multi-task executor that generates realistic pedestrian dynamics under different task plannings; and the VehExecutor, a physical transition-based policy that generates physically plausible vehicle dynamics. Extensive experiments show that ChatDyn can generate realistic driving scene dynamics with multiple vehicles and pedestrians, and significantly outperforms previous methods on subtasks. Code and model will be available at https://vfishc.github.io/chatdyn.
