Table of Contents
Fetching ...

Fast LiDAR Upsampling using Conditional Diffusion Models

Sander Elias Magnussen Helgesen, Kazuto Nakashima, Jim Tørresen, Ryo Kurazume

TL;DR

The paper addresses real-time upsampling of sparse LiDAR data to dense representations for autonomous robotics. It casts upsampling as image-based completion on spherical-projected range–reflectance images using a conditional DDPM with multiple inpainting masks. The approach delivers state-of-the-art fidelity for $4\times$ upsampling on KITTI-360 with substantial speedups (approximately $39\times$ faster) over prior diffusion-based baselines, achieving around $7.3$ FPS. It demonstrates robust generalization across real and synthetic datasets, and finds that multi-dataset training is not strictly necessary for strong performance.

Abstract

The search for refining 3D LiDAR data has attracted growing interest motivated by recent techniques such as supervised learning or generative model-based methods. Existing approaches have shown the possibilities for using diffusion models to generate refined LiDAR data with high fidelity, although the performance and speed of such methods have been limited. These limitations make it difficult to execute in real-time, causing the approaches to struggle in real-world tasks such as autonomous navigation and human-robot interaction. In this work, we introduce a novel approach based on conditional diffusion models for fast and high-quality sparse-to-dense upsampling of 3D scene point clouds through an image representation. Our method employs denoising diffusion probabilistic models trained with conditional inpainting masks, which have been shown to give high performance on image completion tasks. We introduce a series of experiments, including multiple datasets, sampling steps, and conditional masks. This paper illustrates that our method outperforms the baselines in sampling speed and quality on upsampling tasks using the KITTI-360 dataset. Furthermore, we illustrate the generalization ability of our approach by simultaneously training on real-world and synthetic datasets, introducing variance in quality and environments.

Fast LiDAR Upsampling using Conditional Diffusion Models

TL;DR

The paper addresses real-time upsampling of sparse LiDAR data to dense representations for autonomous robotics. It casts upsampling as image-based completion on spherical-projected range–reflectance images using a conditional DDPM with multiple inpainting masks. The approach delivers state-of-the-art fidelity for upsampling on KITTI-360 with substantial speedups (approximately faster) over prior diffusion-based baselines, achieving around FPS. It demonstrates robust generalization across real and synthetic datasets, and finds that multi-dataset training is not strictly necessary for strong performance.

Abstract

The search for refining 3D LiDAR data has attracted growing interest motivated by recent techniques such as supervised learning or generative model-based methods. Existing approaches have shown the possibilities for using diffusion models to generate refined LiDAR data with high fidelity, although the performance and speed of such methods have been limited. These limitations make it difficult to execute in real-time, causing the approaches to struggle in real-world tasks such as autonomous navigation and human-robot interaction. In this work, we introduce a novel approach based on conditional diffusion models for fast and high-quality sparse-to-dense upsampling of 3D scene point clouds through an image representation. Our method employs denoising diffusion probabilistic models trained with conditional inpainting masks, which have been shown to give high performance on image completion tasks. We introduce a series of experiments, including multiple datasets, sampling steps, and conditional masks. This paper illustrates that our method outperforms the baselines in sampling speed and quality on upsampling tasks using the KITTI-360 dataset. Furthermore, we illustrate the generalization ability of our approach by simultaneously training on real-world and synthetic datasets, introducing variance in quality and environments.
Paper Structure (27 sections, 1 equation, 5 figures, 3 tables)

This paper contains 27 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Upsampling results using our method. We show the row-resolution input (left) and the high-resolution output obtained by our conditional diffusion model (right), each with the range image (top), the reflectance image (middle), and the point cloud converted from the range image (bottom ). Compared to the existing approaches with 1,160 and 320 sampling steps zyrianov2022learningnakashima2024lidar, our method can produce better results in only 8 steps. For visual purposes, the images for reflectance and range have been cropped from the original $64 \times 1024$ images.
  • Figure 2: Overview of our upsampling method using a diffusion model. In each reverse diffusion step, the known region is re-initialized by masked blending. Proj mean translation between point clouds and images using spherical projection.
  • Figure 3: All masks are originally generated for $64 \times 1024$ images, but for readability, the following examples have been resized to $64 \times 512$. Each mask is binary and the black area is the area affected by the mask.
  • Figure 4: All metrics comparing our best model configuration C with the baseline R2DM. All metrics but IoU have been log normalized for visual purposes.
  • Figure 5: Comparing the results of all baselines with Ours (Config C). The results are reprinted with point clouds on top, followed by range then reflectance images and lastly semantic segmentation results.