L3DR: 3D-aware LiDAR Diffusion and Rectification

Quan Liu; Xiaoqin Zhang; Ling Shao; Shijian Lu

L3DR: 3D-aware LiDAR Diffusion and Rectification

Quan Liu, Xiaoqin Zhang, Ling Shao, Shijian Lu

TL;DR

L3DR is designed, a 3D-aware LiDAR Diffusion and Rectification framework that can regress and cancel RV artifacts in 3D space and restore local geometry accurately and achieves superb geometry realism by predicting point-level offsets in 3D space.

Abstract

Range-view (RV) based LiDAR diffusion has recently made huge strides towards 2D photo-realism. However, it neglects 3D geometry realism and often generates various RV artifacts such as depth bleeding and wavy surfaces. We design L3DR, a 3D-aware LiDAR Diffusion and Rectification framework that can regress and cancel RV artifacts in 3D space and restore local geometry accurately. Our theoretical and empirical analysis reveals that 3D models are inherently superior to 2D models in generating sharp and authentic boundaries. Leveraging such analysis, we design a 3D residual regression network that rectifies RV artifacts and achieves superb geometry realism by predicting point-level offsets in 3D space. On top of that, we design a Welsch Loss that helps focus on local geometry and ignore anomalous regions effectively. Extensive experiments over multiple benchmarks including KITTI, KITTI360, nuScenes and Waymo show that the proposed L3DR achieves state-of-the-art generation and superior geometry-realism consistently. In addition, L3DR is generally applicable to different LiDAR diffusion models with little computational overhead.

L3DR: 3D-aware LiDAR Diffusion and Rectification

TL;DR

Abstract

Paper Structure (58 sections, 3 theorems, 12 equations, 10 figures, 9 tables)

This paper contains 58 sections, 3 theorems, 12 equations, 10 figures, 9 tables.

Introduction
Related Work
Diffusion Models
LiDAR Generation
Range view generation.
Re-sampled LiDAR from 3D representations.
Pilot Study
Preliminaries
Diffusion models.
Forward process.
Reverse sampling process.
Theoretical Analysis
Proof sketch.
Empirical Analysis
Method
...and 43 more sections

Key Result

Theorem 1

Given the assumption that diffusion UNets are Lipschitz continuous, the output image $x_0$ generated by DDIM is locally Lipschitz continuous with respect to the input noise $x_T$. Moreover, the spatial gradient of $x_0$ is bounded:

Figures (10)

Figure 1: L3DR effectively rectifies LiDAR range-view (RV) diffusion artifacts by selectively ignoring anomalous training regions. (a) Depth bleeding creates fake points between the foreground vehicle and the background. (b) Wavy surfaces and rounded edges synthesized by RV diffusion are straightened and sharpened after rectification. (c) Anomalous regions in training data pairs, e.g., a diffusion-generated wall perpendicular to ground truth (GT), can overshadow RV artifacts and hijack artifact removal task; these are suppressed with the Welsch Loss (see section \ref{['sec:decomposition_of_objective']}). Generated and rectified point clouds are colored while GT point clouds are in gray.
Figure 2: Empirical validation of Theorem \ref{['th:continuous']}. The graph shows the distribution of $\|\nabla x\|$ for GT, vanilla RV diffusion, and our rectified RV, including the corresponding Jensen-Shannon Divergence (JSD) w.r.t. the GT.
Figure 3: The training pipeline of the proposed L3DR framework. In the LiDAR diffusion training stage, generated and ground-truth point cloud pairs are collected using semantic-conditioned LiDAR diffusion. In the residual regression training stage, such data pairs are employed to train a 3D network to remove RV artifacts present in the residuals to improve generation quality.
Figure 4: Visualization of two types of errors in RRN training data. While the generated point clouds (colored) approximate the GT (gray) in most of the regions with high-variance errors, i.e., RV artifacts as highlighted with green dotted lines, there are also regions with high-bias errors which impede training, including (1) shifted walls, (2) random points on the leaves where laser hits are hard to predict, and (3) isolated chunks with consistent depth error. These bias-dominated regions are harmful for RRN training.
Figure 5: The inference pipeline of the proposed L3DR.
...and 5 more figures

Theorems & Definitions (4)

Theorem 1
Corollary 1
Corollary 2
proof

L3DR: 3D-aware LiDAR Diffusion and Rectification

TL;DR

Abstract

L3DR: 3D-aware LiDAR Diffusion and Rectification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (4)