Table of Contents
Fetching ...

UltraLiDAR: Learning Compact Representations for LiDAR Completion and Generation

Yuwen Xiong, Wei-Chiu Ma, Jingkang Wang, Raquel Urtasun

TL;DR

UltraLiDAR addresses the core challenges of sparse and costly LiDAR data by learning a compact discrete representation that encodes scene geometry. By voxelizing LiDAR into BEV occupancy grids and applying a VQ-VAE, the approach yields robust dense completion and a generative codebook suitable for unconditional/conditional LiDAR generation and manipulation. Empirical results show improved downstream perception when densifying sparse data, strong cross-dataset generalization, and state-of-the-art-like realism in generated LiDAR scenes with a high human-preference rate. This discrete, controllable framework enables data-efficient LiDAR simulation and scalable perception research for self-driving systems.

Abstract

LiDAR provides accurate geometric measurements of the 3D world. Unfortunately, dense LiDARs are very expensive and the point clouds captured by low-beam LiDAR are often sparse. To address these issues, we present UltraLiDAR, a data-driven framework for scene-level LiDAR completion, LiDAR generation, and LiDAR manipulation. The crux of UltraLiDAR is a compact, discrete representation that encodes the point cloud's geometric structure, is robust to noise, and is easy to manipulate. We show that by aligning the representation of a sparse point cloud to that of a dense point cloud, we can densify the sparse point clouds as if they were captured by a real high-density LiDAR, drastically reducing the cost. Furthermore, by learning a prior over the discrete codebook, we can generate diverse, realistic LiDAR point clouds for self-driving. We evaluate the effectiveness of UltraLiDAR on sparse-to-dense LiDAR completion and LiDAR generation. Experiments show that densifying real-world point clouds with our approach can significantly improve the performance of downstream perception systems. Compared to prior art on LiDAR generation, our approach generates much more realistic point clouds. According to A/B test, over 98.5\% of the time human participants prefer our results over those of previous methods.

UltraLiDAR: Learning Compact Representations for LiDAR Completion and Generation

TL;DR

UltraLiDAR addresses the core challenges of sparse and costly LiDAR data by learning a compact discrete representation that encodes scene geometry. By voxelizing LiDAR into BEV occupancy grids and applying a VQ-VAE, the approach yields robust dense completion and a generative codebook suitable for unconditional/conditional LiDAR generation and manipulation. Empirical results show improved downstream perception when densifying sparse data, strong cross-dataset generalization, and state-of-the-art-like realism in generated LiDAR scenes with a high human-preference rate. This discrete, controllable framework enables data-efficient LiDAR simulation and scalable perception research for self-driving systems.

Abstract

LiDAR provides accurate geometric measurements of the 3D world. Unfortunately, dense LiDARs are very expensive and the point clouds captured by low-beam LiDAR are often sparse. To address these issues, we present UltraLiDAR, a data-driven framework for scene-level LiDAR completion, LiDAR generation, and LiDAR manipulation. The crux of UltraLiDAR is a compact, discrete representation that encodes the point cloud's geometric structure, is robust to noise, and is easy to manipulate. We show that by aligning the representation of a sparse point cloud to that of a dense point cloud, we can densify the sparse point clouds as if they were captured by a real high-density LiDAR, drastically reducing the cost. Furthermore, by learning a prior over the discrete codebook, we can generate diverse, realistic LiDAR point clouds for self-driving. We evaluate the effectiveness of UltraLiDAR on sparse-to-dense LiDAR completion and LiDAR generation. Experiments show that densifying real-world point clouds with our approach can significantly improve the performance of downstream perception systems. Compared to prior art on LiDAR generation, our approach generates much more realistic point clouds. According to A/B test, over 98.5\% of the time human participants prefer our results over those of previous methods.
Paper Structure (50 sections, 2 equations, 16 figures, 6 tables)

This paper contains 50 sections, 2 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Overview of UltraLiDAR pipeline. For (a) LiDAR completion, the sparse encoder maps the sparse point cloud to discrete codes, and the dense decoder reconstructs dense data from them; For (b) LiDAR generation, the transformer model starts from a blank canvas or canvas with codes mapped from the partial observations; and iteratively predicts and updates the missing parts. The decoder produces the LiDAR output given the predicted code as the generation results.
  • Figure 2: LiDAR completion and 3D detection on Pandaset. With densified point clouds, the detection model can identify more objects and reduce false negatives. We show detection results in red boxes and ground truth in blue. The missing area far away from ego vehicle in the densified results is caused by uphill; we refer the reader to the supp. material for the explanation with camera visualization.
  • Figure 3: Qualitative comparison against baselines on unconditional LiDAR generation. We compare with two state-of-the-art LiDAR generation methods Projected GAN sauer2021projected and LiDARGen zyrianov2022lidargen and include real data for reference. Our model can generate results with more structured layouts and clearer beam patterns.
  • Figure 4: Unconditional generation results on Pandaset. We train our model on dense Pandaset data and generate dense results. The generated samples show diverse scenario layouts with proper actor placement (e.g., parked car in the right sample). The synthesized point clouds are realistic such that a pre-trained detector can directly work out-of-the-box.
  • Figure 5: Conditional LiDAR generation for dirt removal. We mask the red rectangular region in the range view image to mimic dirt occluder. Left: Original input. The masked region is not visible to the model. Right: Our generation results. Our model can successfully recover the vehicle that is partially observed.
  • ...and 11 more figures