Table of Contents
Fetching ...

Towards Realistic Scene Generation with LiDAR Diffusion Models

Haoxi Ran, Vitor Guizilini, Yue Wang

TL;DR

This work introduces LiDAR Diffusion Models (LiDMs), a latent-diffusion framework tailored for LiDAR scene generation that compresses range-image data into a LiDAR-aware latent space. By employing curve-wise compression, point-wise coordinate supervision, and patch-wise encoding, LiDMs preserve LiDAR patterns, geometry, and object context while enabling efficient diffusion and multi-modal conditioning (semantic maps, camera views, and text). The method achieves state-of-the-art unconditional generation at 64-beam and state-of-the-art conditional generation, with substantial speedups over prior point-based diffusion approaches. This approach enables controllable LiDAR synthesis for autonomous driving and robotics, marking a significant step toward realistic, multimodal LiDAR data generation and simulation.

Abstract

Diffusion models (DMs) excel in photo-realistic image synthesis, but their adaptation to LiDAR scene generation poses a substantial hurdle. This is primarily because DMs operating in the point space struggle to preserve the curve-like patterns and 3D geometry of LiDAR scenes, which consumes much of their representation power. In this paper, we propose LiDAR Diffusion Models (LiDMs) to generate LiDAR-realistic scenes from a latent space tailored to capture the realism of LiDAR scenes by incorporating geometric priors into the learning pipeline. Our method targets three major desiderata: pattern realism, geometry realism, and object realism. Specifically, we introduce curve-wise compression to simulate real-world LiDAR patterns, point-wise coordinate supervision to learn scene geometry, and patch-wise encoding for a full 3D object context. With these three core designs, our method achieves competitive performance on unconditional LiDAR generation in 64-beam scenario and state of the art on conditional LiDAR generation, while maintaining high efficiency compared to point-based DMs (up to 107$\times$ faster). Furthermore, by compressing LiDAR scenes into a latent space, we enable the controllability of DMs with various conditions such as semantic maps, camera views, and text prompts.

Towards Realistic Scene Generation with LiDAR Diffusion Models

TL;DR

This work introduces LiDAR Diffusion Models (LiDMs), a latent-diffusion framework tailored for LiDAR scene generation that compresses range-image data into a LiDAR-aware latent space. By employing curve-wise compression, point-wise coordinate supervision, and patch-wise encoding, LiDMs preserve LiDAR patterns, geometry, and object context while enabling efficient diffusion and multi-modal conditioning (semantic maps, camera views, and text). The method achieves state-of-the-art unconditional generation at 64-beam and state-of-the-art conditional generation, with substantial speedups over prior point-based diffusion approaches. This approach enables controllable LiDAR synthesis for autonomous driving and robotics, marking a significant step toward realistic, multimodal LiDAR data generation and simulation.

Abstract

Diffusion models (DMs) excel in photo-realistic image synthesis, but their adaptation to LiDAR scene generation poses a substantial hurdle. This is primarily because DMs operating in the point space struggle to preserve the curve-like patterns and 3D geometry of LiDAR scenes, which consumes much of their representation power. In this paper, we propose LiDAR Diffusion Models (LiDMs) to generate LiDAR-realistic scenes from a latent space tailored to capture the realism of LiDAR scenes by incorporating geometric priors into the learning pipeline. Our method targets three major desiderata: pattern realism, geometry realism, and object realism. Specifically, we introduce curve-wise compression to simulate real-world LiDAR patterns, point-wise coordinate supervision to learn scene geometry, and patch-wise encoding for a full 3D object context. With these three core designs, our method achieves competitive performance on unconditional LiDAR generation in 64-beam scenario and state of the art on conditional LiDAR generation, while maintaining high efficiency compared to point-based DMs (up to 107 faster). Furthermore, by compressing LiDAR scenes into a latent space, we enable the controllability of DMs with various conditions such as semantic maps, camera views, and text prompts.
Paper Structure (41 sections, 13 equations, 12 figures, 5 tables)

This paper contains 41 sections, 13 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Our method (LiDM) establishes a new state-of-the-art in unconditional LiDAR-realistic scene generation, and marks a milestone towards conditional LiDAR scene generation from different input modalities.
  • Figure 2: An overview of LiDMs on 64-beam data, which includes three parts: LiDAR compression (cf. Sec. \ref{['sec:real']} & \ref{['sec:train']}), Multimodal Conditioning (cf. Sec. \ref{['sec:condition']}), and LiDAR Diffusion (cf. Sec. \ref{['sec:train']}).
  • Figure 3: Samples from LiDARGen zyrianov2022learning, Latent Diffusion rombach2022high, and our LiDMs on 64-beam scenario.
  • Figure 4: Samples from our LiDMs on 32-beam scenario.
  • Figure 5: Samples from our LiDM for Semantic-Map-to-LiDAR generation on SemanticKITTI behley2019semantickitti.
  • ...and 7 more figures