Table of Contents
Fetching ...

RangeLDM: Fast Realistic LiDAR Point Cloud Generation

Qianjiang Hu, Zhimin Zhang, Wei Hu

TL;DR

RangeLDM is introduced, a novel approach for rapidly generating high-quality range-view LiDAR point clouds via latent diffusion models by correcting range-view data distribution for accurate projection from point clouds to range images via Hough voting, and leveraging a diffusion model to enhance expressivity.

Abstract

Autonomous driving demands high-quality LiDAR data, yet the cost of physical LiDAR sensors presents a significant scaling-up challenge. While recent efforts have explored deep generative models to address this issue, they often consume substantial computational resources with slow generation speeds while suffering from a lack of realism. To address these limitations, we introduce RangeLDM, a novel approach for rapidly generating high-quality range-view LiDAR point clouds via latent diffusion models. We achieve this by correcting range-view data distribution for accurate projection from point clouds to range images via Hough voting, which has a critical impact on generative learning. We then compress the range images into a latent space with a variational autoencoder, and leverage a diffusion model to enhance expressivity. Additionally, we instruct the model to preserve 3D structural fidelity by devising a range-guided discriminator. Experimental results on KITTI-360 and nuScenes datasets demonstrate both the robust expressiveness and fast speed of our LiDAR point cloud generation.

RangeLDM: Fast Realistic LiDAR Point Cloud Generation

TL;DR

RangeLDM is introduced, a novel approach for rapidly generating high-quality range-view LiDAR point clouds via latent diffusion models by correcting range-view data distribution for accurate projection from point clouds to range images via Hough voting, and leveraging a diffusion model to enhance expressivity.

Abstract

Autonomous driving demands high-quality LiDAR data, yet the cost of physical LiDAR sensors presents a significant scaling-up challenge. While recent efforts have explored deep generative models to address this issue, they often consume substantial computational resources with slow generation speeds while suffering from a lack of realism. To address these limitations, we introduce RangeLDM, a novel approach for rapidly generating high-quality range-view LiDAR point clouds via latent diffusion models. We achieve this by correcting range-view data distribution for accurate projection from point clouds to range images via Hough voting, which has a critical impact on generative learning. We then compress the range images into a latent space with a variational autoencoder, and leverage a diffusion model to enhance expressivity. Additionally, we instruct the model to preserve 3D structural fidelity by devising a range-guided discriminator. Experimental results on KITTI-360 and nuScenes datasets demonstrate both the robust expressiveness and fast speed of our LiDAR point cloud generation.
Paper Structure (14 sections, 9 equations, 8 figures, 9 tables)

This paper contains 14 sections, 9 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: (a). Unconditional LiDAR point cloud generation with realistic global structure. (b). Conditional LiDAR point cloud generation, including LiDAR point cloud upsampling and inpainting. (c). Generation quality (Maximum Mean Discrepancy, abbr. MMD) vs. generation speed (samples/s) of competitive LiDAR point cloud generation methods on the KITTI-360 liao2022kitti dataset. The proposed method outperforms the state-of-the-art methods LiDARGen zyrianov2022learning and UltraLiDAR xiong2023learning in both generation quality and generation speed. All speeds are evaluated on a single RTX 3090 GPU.
  • Figure 2: The framework of the proposed RangeLDM. Firstly, we project point clouds onto high-quality range images via Hough Voting (Section \ref{['subsec:range_projection']}). Subsequently, we train a VAE to compress the range images into low-dimensional latent features $\mathbf{z}_0$, which encodes the range images with the encoder $\mathcal{E}_\zeta$ and reconstructs range images from latent features with the decoder $\mathcal{G}_\eta$ (Section \ref{['subsubsec:vae']}). Here, a range-guided discriminator $\mathcal{D}_\tau$ is introduced to guide the decoder in the reconstruction of 3D structures. We finally train a latent diffusion model to capture the distribution of the latent features (Section \ref{['subsubsec:diffusion']}). With optional conditional inputs, the proposed method is applicable to tasks such as point cloud upsampling and inpainting (Section \ref{['sec:generation_tasks']}).
  • Figure 3: Comparison of range projection by the typical method described in Eq. \ref{['eq:ori_sph']} and our method with Hough Voting as in Eq. \ref{['eq:new_spi']}.
  • Figure 4: Qualitative results comparing against baselines for unconditional LiDAR generation on KITTI-360. Real point clouds are only for reference. Our model produces results that closely resemble real-world data, which excels in generating road scenes, such as cars (the first row), road bollards (the second row) and crossroads (the last row).
  • Figure 5: Comparison of upsampling results on KITTI-360. We downsampled the ground truth by a factor of four as the input, and demonstrated the results of different methods on $4\times$-upsampling of the input.
  • ...and 3 more figures