Taming Transformers for Realistic Lidar Point Cloud Generation
Hamed Haghighi, Amir Samadi, Mehrdad Dianati, Valentina Donzella, Kurt Debattista
TL;DR
LidarGRIT tackles the realism gap in Lidar point cloud generation by generating range images in a latent space with an autoregressive transformer and decoding them via a VQ-VAE that separately produces clean ranges and raydrop masks. The approach combines a dedicated raydrop loss and geometry-preservation during VQ-VAE training, plus an autoregressive token-space generator, enabling both accurate raydrop noise and high-fidelity geometry. Evaluations on KITTI-360 and KITTI-odometry show LidarGRIT achieving state-of-the-art results across image, BEV, and point-cloud representations, with notable gains in image-based realism due to improved raydrop synthesis. This work provides a practical, scalable framework for realistic Lidar simulation with potential benefits for autonomous driving training and evaluation.
Abstract
Diffusion Models (DMs) have achieved State-Of-The-Art (SOTA) results in the Lidar point cloud generation task, benefiting from their stable training and iterative refinement during sampling. However, DMs often fail to realistically model Lidar raydrop noise due to their inherent denoising process. To retain the strength of iterative sampling while enhancing the generation of raydrop noise, we introduce LidarGRIT, a generative model that uses auto-regressive transformers to iteratively sample the range images in the latent space rather than image space. Furthermore, LidarGRIT utilises VQ-VAE to separately decode range images and raydrop masks. Our results show that LidarGRIT achieves superior performance compared to SOTA models on KITTI-360 and KITTI odometry datasets. Code available at:https://github.com/hamedhaghighi/LidarGRIT.
