U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences
Xiang Xu, Ao Liang, Youquan Liu, Linfeng Li, Lingdong Kong, Ziwei Liu, Qingshan Liu
TL;DR
U4D introduces an uncertainty-aware framework for 4D LiDAR world modeling that explicitly models spatial uncertainty and uses a two-stage diffusion process to generate scenes. The first stage reconstructs high-uncertainty regions, while the second stage completes the rest under priors, with MoST blocks ensuring temporal coherence. The approach yields superior geometric fidelity and temporal stability on nuScenes and SemanticKITTI, and improves downstream perception calibration and segmentation when used for data augmentation. This work advances reliable 4D LiDAR synthesis, enabling better simulation, pretraining, and evaluation for autonomous driving systems.
Abstract
Modeling dynamic 3D environments from LiDAR sequences is central to building reliable 4D worlds for autonomous driving and embodied AI. Existing generative frameworks, however, often treat all spatial regions uniformly, overlooking the varying uncertainty across real-world scenes. This uniform generation leads to artifacts in complex or ambiguous regions, limiting realism and temporal stability. In this work, we present U4D, an uncertainty-aware framework for 4D LiDAR world modeling. Our approach first estimates spatial uncertainty maps from a pretrained segmentation model to localize semantically challenging regions. It then performs generation in a "hard-to-easy" manner through two sequential stages: (1) uncertainty-region modeling, which reconstructs high-entropy regions with fine geometric fidelity, and (2) uncertainty-conditioned completion, which synthesizes the remaining areas under learned structural priors. To further ensure temporal coherence, U4D incorporates a mixture of spatio-temporal (MoST) block that adaptively fuses spatial and temporal representations during diffusion. Extensive experiments show that U4D produces geometrically faithful and temporally consistent LiDAR sequences, advancing the reliability of 4D world modeling for autonomous perception and simulation.
