Table of Contents
Fetching ...

Taming Transformers for Realistic Lidar Point Cloud Generation

Hamed Haghighi, Amir Samadi, Mehrdad Dianati, Valentina Donzella, Kurt Debattista

TL;DR

LidarGRIT tackles the realism gap in Lidar point cloud generation by generating range images in a latent space with an autoregressive transformer and decoding them via a VQ-VAE that separately produces clean ranges and raydrop masks. The approach combines a dedicated raydrop loss and geometry-preservation during VQ-VAE training, plus an autoregressive token-space generator, enabling both accurate raydrop noise and high-fidelity geometry. Evaluations on KITTI-360 and KITTI-odometry show LidarGRIT achieving state-of-the-art results across image, BEV, and point-cloud representations, with notable gains in image-based realism due to improved raydrop synthesis. This work provides a practical, scalable framework for realistic Lidar simulation with potential benefits for autonomous driving training and evaluation.

Abstract

Diffusion Models (DMs) have achieved State-Of-The-Art (SOTA) results in the Lidar point cloud generation task, benefiting from their stable training and iterative refinement during sampling. However, DMs often fail to realistically model Lidar raydrop noise due to their inherent denoising process. To retain the strength of iterative sampling while enhancing the generation of raydrop noise, we introduce LidarGRIT, a generative model that uses auto-regressive transformers to iteratively sample the range images in the latent space rather than image space. Furthermore, LidarGRIT utilises VQ-VAE to separately decode range images and raydrop masks. Our results show that LidarGRIT achieves superior performance compared to SOTA models on KITTI-360 and KITTI odometry datasets. Code available at:https://github.com/hamedhaghighi/LidarGRIT.

Taming Transformers for Realistic Lidar Point Cloud Generation

TL;DR

LidarGRIT tackles the realism gap in Lidar point cloud generation by generating range images in a latent space with an autoregressive transformer and decoding them via a VQ-VAE that separately produces clean ranges and raydrop masks. The approach combines a dedicated raydrop loss and geometry-preservation during VQ-VAE training, plus an autoregressive token-space generator, enabling both accurate raydrop noise and high-fidelity geometry. Evaluations on KITTI-360 and KITTI-odometry show LidarGRIT achieving state-of-the-art results across image, BEV, and point-cloud representations, with notable gains in image-based realism due to improved raydrop synthesis. This work provides a practical, scalable framework for realistic Lidar simulation with potential benefits for autonomous driving training and evaluation.

Abstract

Diffusion Models (DMs) have achieved State-Of-The-Art (SOTA) results in the Lidar point cloud generation task, benefiting from their stable training and iterative refinement during sampling. However, DMs often fail to realistically model Lidar raydrop noise due to their inherent denoising process. To retain the strength of iterative sampling while enhancing the generation of raydrop noise, we introduce LidarGRIT, a generative model that uses auto-regressive transformers to iteratively sample the range images in the latent space rather than image space. Furthermore, LidarGRIT utilises VQ-VAE to separately decode range images and raydrop masks. Our results show that LidarGRIT achieves superior performance compared to SOTA models on KITTI-360 and KITTI odometry datasets. Code available at:https://github.com/hamedhaghighi/LidarGRIT.
Paper Structure (13 sections, 6 equations, 3 figures, 3 tables)

This paper contains 13 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: (a) The range image generated by diffusion model (R2DM nakashima2024lidar) exhibits less realistic raydrop noise compared to our provided sample and the real one. (b) we propose to sample range image in the latent space via Auto-Regressive (AR) transformer 10.5555/3295222.3295349. (c) We then generate the raydrop mask and clean range image separately in the image space via VQ-VAE DBLP:conf/cvpr/EsserRO21 decoder.
  • Figure 2: Overview of the training process.
  • Figure 3: Qualitative comparison on KITTI-360 Liao2022PAMI and KITTI odometry datasets Geiger2013IJRR.