LaGen: Towards Autoregressive LiDAR Scene Generation
Sizhuo Zhou, Xiaosong Jia, Fanrui Zhang, Junjie Li, Juyong Zhang, Yukang Feng, Jianwen Sun, Songbur Wong, Junqi You, Junchi Yan
TL;DR
LaGen addresses the need for long-horizon, interactive LiDAR scene generation from a single frame by introducing a frame-by-frame autoregressive framework based on a Latent Diffusion Model. The method combines a range-image representation, a multi-condition diffusion generator, and two key modules—Scene Decoupling Estimation and Noise Modulation—to achieve strong spatiotemporal coherence. It supports interactive edits at the object level and demonstrates superior performance over state-of-the-art LiDAR generation and prediction models on nuScenes, with a dedicated long-horizon benchmark. The work enables improved closed-loop simulation and world modeling for autonomous driving by integrating per-step decisions into future LiDAR predictions.
Abstract
Generative world models for autonomous driving (AD) have become a trending topic. Unlike the widely studied image modality, in this work we explore generative world models for LiDAR data. Existing generation methods for LiDAR data only support single frame generation, while existing prediction approaches require multiple frames of historical input and can only deterministically predict multiple frames at once, lacking interactivity. Both paradigms fail to support long-horizon interactive generation. To this end, we introduce LaGen, which to the best of our knowledge is the first framework capable of frame-by-frame autoregressive generation of long-horizon LiDAR scenes. LaGen is able to take a single-frame LiDAR input as a starting point and effectively utilize bounding box information as conditions to generate high-fidelity 4D scene point clouds. In addition, we introduce a scene decoupling estimation module to enhance the model's interactive generation capability for object-level content, as well as a noise modulation module to mitigate error accumulation during long-horizon generation. We construct a protocol based on nuScenes for evaluating long-horizon LiDAR scene generation. Experimental results comprehensively demonstrate LaGen outperforms state-of-the-art LiDAR generation and prediction models, especially on the later frames.
