Table of Contents
Fetching ...

Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion

Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch, Jens Behley, Cyrill Stachniss

TL;DR

The paper tackles the challenge of completing 3D LiDAR scenes from a single sparse scan. It introduces a point-level denoising diffusion probabilistic model (DDPM) that operates directly on raw LiDAR points at scene scale, using a local diffusion formulation where noise is added per point and conditioning on the input scan guides generation. A noise-prediction regularization is proposed to stabilize training by encouraging the predicted noise to follow a standard normal distribution, and a refinement network upscales and sharpens the completed scene. Across SemanticKITTI, KITTI-360, and additional data, the method achieves state-of-the-art or competitive Chamfer distance, JSD, and IoU metrics, while remaining generalizable to different datasets without voxelization or projection. This work enables realistic, detailed scene completion for autonomous driving and suggests a scalable direction for diffusion models on real-world 3D point clouds.

Abstract

Computer vision techniques play a central role in the perception stack of autonomous vehicles. Such methods are employed to perceive the vehicle surroundings given sensor data. 3D LiDAR sensors are commonly used to collect sparse 3D point clouds from the scene. However, compared to human perception, such systems struggle to deduce the unseen parts of the scene given those sparse point clouds. In this matter, the scene completion task aims at predicting the gaps in the LiDAR measurements to achieve a more complete scene representation. Given the promising results of recent diffusion models as generative models for images, we propose extending them to achieve scene completion from a single 3D LiDAR scan. Previous works used diffusion models over range images extracted from LiDAR data, directly applying image-based diffusion methods. Distinctly, we propose to directly operate on the points, reformulating the noising and denoising diffusion process such that it can efficiently work at scene scale. Together with our approach, we propose a regularization loss to stabilize the noise predicted during the denoising process. Our experimental evaluation shows that our method can complete the scene given a single LiDAR scan as input, producing a scene with more details compared to state-of-the-art scene completion methods. We believe that our proposed diffusion process formulation can support further research in diffusion models applied to scene-scale point cloud data.

Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion

TL;DR

The paper tackles the challenge of completing 3D LiDAR scenes from a single sparse scan. It introduces a point-level denoising diffusion probabilistic model (DDPM) that operates directly on raw LiDAR points at scene scale, using a local diffusion formulation where noise is added per point and conditioning on the input scan guides generation. A noise-prediction regularization is proposed to stabilize training by encouraging the predicted noise to follow a standard normal distribution, and a refinement network upscales and sharpens the completed scene. Across SemanticKITTI, KITTI-360, and additional data, the method achieves state-of-the-art or competitive Chamfer distance, JSD, and IoU metrics, while remaining generalizable to different datasets without voxelization or projection. This work enables realistic, detailed scene completion for autonomous driving and suggests a scalable direction for diffusion models on real-world 3D point clouds.

Abstract

Computer vision techniques play a central role in the perception stack of autonomous vehicles. Such methods are employed to perceive the vehicle surroundings given sensor data. 3D LiDAR sensors are commonly used to collect sparse 3D point clouds from the scene. However, compared to human perception, such systems struggle to deduce the unseen parts of the scene given those sparse point clouds. In this matter, the scene completion task aims at predicting the gaps in the LiDAR measurements to achieve a more complete scene representation. Given the promising results of recent diffusion models as generative models for images, we propose extending them to achieve scene completion from a single 3D LiDAR scan. Previous works used diffusion models over range images extracted from LiDAR data, directly applying image-based diffusion methods. Distinctly, we propose to directly operate on the points, reformulating the noising and denoising diffusion process such that it can efficiently work at scene scale. Together with our approach, we propose a regularization loss to stabilize the noise predicted during the denoising process. Our experimental evaluation shows that our method can complete the scene given a single LiDAR scan as input, producing a scene with more details compared to state-of-the-art scene completion methods. We believe that our proposed diffusion process formulation can support further research in diffusion models applied to scene-scale point cloud data.
Paper Structure (14 sections, 10 equations, 6 figures, 5 tables)

This paper contains 14 sections, 10 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Starting from a single input scan $\mathcal{P}$, we add Gaussian noise to each point, defining the noisy input $\mathcal{P}^T$. Then, we use our trained noise predictor $\epsilon_\theta$ to denoise $\mathcal{P}^T$ iteratively until arriving at $\mathcal{P}^0$, yielding a completed representation of the 3D scene.
  • Figure 2: Comparison between Gaussian noise with standard deviation $\sigma$ and mean $\mu$ over non-normalized and normalized input point cloud and our proposed local point-wise noise formulation.
  • Figure 3: Mean and standard deviation of the predicted noise $\boldsymbol{\epsilon}_\theta$ without the noise regularization. In this experiment we use DPMSolver lu2022neurips to reduce the denoising steps from $1,000$ to $10$.
  • Figure 4: Diagram of the conditioning in each layer $l$.
  • Figure 5: Qualitative results on one scan from KITTI-360. Colors depict point height normalized by the height range of each point cloud.
  • ...and 1 more figures