Table of Contents
Fetching ...

LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences

Ao Liang, Youquan Liu, Yu Yang, Dongyue Lu, Linfeng Li, Lingdong Kong, Huaici Zhao, Wei Tsang Ooi

TL;DR

LiDARCrafter addresses the need for controllable, temporally coherent 4D LiDAR generation by proposing a three-stage framework that converts natural language into an editable 4D layout (Text2Layout), renders a high-fidelity static frame (Layout2Scene), and autoregressively synthesizes the full LiDAR sequence (Scene2Seq). An explicit 4D layout and a comprehensive EvalSuite enable fine-grained control and standardized benchmarking across scene, object, and sequence levels. Experiments on nuScenes demonstrate state-of-the-art fidelity, controllability, and temporal coherence, supporting applications in data augmentation, simulation, and safety-critical scenario testing. The work also provides a public benchmark and codebase to promote reproducibility and broader adoption in autonomous driving research.

Abstract

Generative world models have become essential data engines for autonomous driving, yet most existing efforts focus on videos or occupancy grids, overlooking the unique LiDAR properties. Extending LiDAR generation to dynamic 4D world modeling presents challenges in controllability, temporal coherence, and evaluation standardization. To this end, we present LiDARCrafter, a unified framework for 4D LiDAR generation and editing. Given free-form natural language inputs, we parse instructions into ego-centric scene graphs, which condition a tri-branch diffusion network to generate object structures, motion trajectories, and geometry. These structured conditions enable diverse and fine-grained scene editing. Additionally, an autoregressive module generates temporally coherent 4D LiDAR sequences with smooth transitions. To support standardized evaluation, we establish a comprehensive benchmark with diverse metrics spanning scene-, object-, and sequence-level aspects. Experiments on the nuScenes dataset using this benchmark demonstrate that LiDARCrafter achieves state-of-the-art performance in fidelity, controllability, and temporal consistency across all levels, paving the way for data augmentation and simulation. The code and benchmark are released to the community.

LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences

TL;DR

LiDARCrafter addresses the need for controllable, temporally coherent 4D LiDAR generation by proposing a three-stage framework that converts natural language into an editable 4D layout (Text2Layout), renders a high-fidelity static frame (Layout2Scene), and autoregressively synthesizes the full LiDAR sequence (Scene2Seq). An explicit 4D layout and a comprehensive EvalSuite enable fine-grained control and standardized benchmarking across scene, object, and sequence levels. Experiments on nuScenes demonstrate state-of-the-art fidelity, controllability, and temporal coherence, supporting applications in data augmentation, simulation, and safety-critical scenario testing. The work also provides a public benchmark and codebase to promote reproducibility and broader adoption in autonomous driving research.

Abstract

Generative world models have become essential data engines for autonomous driving, yet most existing efforts focus on videos or occupancy grids, overlooking the unique LiDAR properties. Extending LiDAR generation to dynamic 4D world modeling presents challenges in controllability, temporal coherence, and evaluation standardization. To this end, we present LiDARCrafter, a unified framework for 4D LiDAR generation and editing. Given free-form natural language inputs, we parse instructions into ego-centric scene graphs, which condition a tri-branch diffusion network to generate object structures, motion trajectories, and geometry. These structured conditions enable diverse and fine-grained scene editing. Additionally, an autoregressive module generates temporally coherent 4D LiDAR sequences with smooth transitions. To support standardized evaluation, we establish a comprehensive benchmark with diverse metrics spanning scene-, object-, and sequence-level aspects. Experiments on the nuScenes dataset using this benchmark demonstrate that LiDARCrafter achieves state-of-the-art performance in fidelity, controllability, and temporal consistency across all levels, paving the way for data augmentation and simulation. The code and benchmark are released to the community.

Paper Structure

This paper contains 40 sections, 36 equations, 18 figures, 12 tables, 1 algorithm.

Figures (18)

  • Figure 1: We propose LiDARCrafter, a 4D LiDAR-based generative world model that supports controllable point cloud layout generation (left), dynamic sequential scene generation (center), and rich scene editing applications (right). Our framework enables intuitive "what you describe is what you get" LiDAR-based 4D world modeling.
  • Figure 2: Framework of LiDARCrafter. In the Text2Layout stage (cf. Section \ref{['sec:layout_generation']}), the natural-language instruction is parsed into an ego-centric scene graph, and a tri-branch diffusion network generates 4D conditions for bounding boxes, future trajectories, and object point clouds. In the Layout2Scene stage (cf. Section \ref{['sec:static_pointcloud_generation']}), a range-image diffusion model uses these conditions to generate a static LiDAR frame. In the Scene2Seq stage (cf. Section \ref{['sec:4D_pointcloud_generation']}), an autoregressive module warps historical points with ego and object motion priors to generate each subsequent frame, producing a temporally coherent LiDAR sequence.
  • Figure 3: Structures of our range-image LiDAR diffusion model.
  • Figure 4: Details of the foreground and background warp.
  • Figure 5: Single-frame LiDAR point cloud generation results. LiDARCrafter produces the pattern closest to the ground truth, with notably superior foreground quality compared to other methods. Best viewed at high resolution.
  • ...and 13 more figures