Table of Contents
Fetching ...

U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

Xiang Xu, Ao Liang, Youquan Liu, Linfeng Li, Lingdong Kong, Ziwei Liu, Qingshan Liu

TL;DR

U4D introduces an uncertainty-aware framework for 4D LiDAR world modeling that explicitly models spatial uncertainty and uses a two-stage diffusion process to generate scenes. The first stage reconstructs high-uncertainty regions, while the second stage completes the rest under priors, with MoST blocks ensuring temporal coherence. The approach yields superior geometric fidelity and temporal stability on nuScenes and SemanticKITTI, and improves downstream perception calibration and segmentation when used for data augmentation. This work advances reliable 4D LiDAR synthesis, enabling better simulation, pretraining, and evaluation for autonomous driving systems.

Abstract

Modeling dynamic 3D environments from LiDAR sequences is central to building reliable 4D worlds for autonomous driving and embodied AI. Existing generative frameworks, however, often treat all spatial regions uniformly, overlooking the varying uncertainty across real-world scenes. This uniform generation leads to artifacts in complex or ambiguous regions, limiting realism and temporal stability. In this work, we present U4D, an uncertainty-aware framework for 4D LiDAR world modeling. Our approach first estimates spatial uncertainty maps from a pretrained segmentation model to localize semantically challenging regions. It then performs generation in a "hard-to-easy" manner through two sequential stages: (1) uncertainty-region modeling, which reconstructs high-entropy regions with fine geometric fidelity, and (2) uncertainty-conditioned completion, which synthesizes the remaining areas under learned structural priors. To further ensure temporal coherence, U4D incorporates a mixture of spatio-temporal (MoST) block that adaptively fuses spatial and temporal representations during diffusion. Extensive experiments show that U4D produces geometrically faithful and temporally consistent LiDAR sequences, advancing the reliability of 4D world modeling for autonomous perception and simulation.

U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

TL;DR

U4D introduces an uncertainty-aware framework for 4D LiDAR world modeling that explicitly models spatial uncertainty and uses a two-stage diffusion process to generate scenes. The first stage reconstructs high-uncertainty regions, while the second stage completes the rest under priors, with MoST blocks ensuring temporal coherence. The approach yields superior geometric fidelity and temporal stability on nuScenes and SemanticKITTI, and improves downstream perception calibration and segmentation when used for data augmentation. This work advances reliable 4D LiDAR synthesis, enabling better simulation, pretraining, and evaluation for autonomous driving systems.

Abstract

Modeling dynamic 3D environments from LiDAR sequences is central to building reliable 4D worlds for autonomous driving and embodied AI. Existing generative frameworks, however, often treat all spatial regions uniformly, overlooking the varying uncertainty across real-world scenes. This uniform generation leads to artifacts in complex or ambiguous regions, limiting realism and temporal stability. In this work, we present U4D, an uncertainty-aware framework for 4D LiDAR world modeling. Our approach first estimates spatial uncertainty maps from a pretrained segmentation model to localize semantically challenging regions. It then performs generation in a "hard-to-easy" manner through two sequential stages: (1) uncertainty-region modeling, which reconstructs high-entropy regions with fine geometric fidelity, and (2) uncertainty-conditioned completion, which synthesizes the remaining areas under learned structural priors. To further ensure temporal coherence, U4D incorporates a mixture of spatio-temporal (MoST) block that adaptively fuses spatial and temporal representations during diffusion. Extensive experiments show that U4D produces geometrically faithful and temporally consistent LiDAR sequences, advancing the reliability of 4D world modeling for autonomous perception and simulation.

Paper Structure

This paper contains 22 sections, 23 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Overview of the U4D framework. U4D generates LiDAR scenes in a "hard-to-easy" manner through two stages. (1) It first estimates spatial uncertainty using a pretrained segmentation model $\mathcal{G}$ based on Shannon Entropy, and performs an unconditional diffusion process to reconstruct high-fidelity geometry within the uncertain regions (cf. \ref{['subsec:uncertainty_gen']}). (2) It then conducts uncertainty-conditioned completion, synthesizing the remaining scene areas guided by the reconstructed structures to ensure global consistency (cf. \ref{['subsec:scene_completion']}).
  • Figure 2: Illustration of the Mixture of Spatio-Temporal (MoST) block. It decomposes features along spatial and temporal dimensions and adaptively fuses them to maintain both spatial fidelity and temporal coherence. Near the network input and output, MoST emphasizes spatial cues, while in intermediate layers it focuses more on temporal dynamics.
  • Figure 3: Qualitative results of sequence point cloud generation on the nuScenes dataset caesar2020nuscenes. U4D preserves both geometric fidelity and temporal consistency, producing sequences most similar to the reference. It reliably reconstructs distant, sparse regions and captures dynamic objects across frames, maintaining coherent structure and motion. Frames are shown in temporal order from left to right. The colors are rendered based on the height information of the point cloud. Best viewed in zoom.
  • Figure 4: Qualitative results of generated objects within scenes on the nuScenes dataset caesar2020nuscenes. U4D accurately captures object geometry and preserves fine structural details while maintaining realistic spatial relationships within the scene. From top to bottom are "car", "bus", and "truck" respectively. All objects are detected using a pretrained PointPillars lang2019pointpillars detector.
  • Figure 5: Qualitative results of sequence point cloud generation on the nuScenes dataset caesar2020nuscenes. U4D preserves both geometric fidelity and temporal consistency, producing sequences most similar to the reference. It reliably reconstructs distant, sparse regions and captures dynamic objects across frames, maintaining coherent structure and motion. Frames are shown in temporal order from top to bottom. The colors are rendered based on the height information of the point cloud. Best viewed in zoom.
  • ...and 1 more figures