Table of Contents
Fetching ...

OLiDM: Object-aware LiDAR Diffusion Models for Autonomous Driving

Tianyi Yan, Junbo Yin, Xianpeng Lang, Ruigang Yang, Cheng-Zhong Xu, Jianbing Shen

TL;DR

OLiDM addresses the challenge of generating realistic and controllable LiDAR data for autonomous driving by introducing Object-Scene Progressive Generation (OPG) and Object Semantic Alignment (OSA). It jointly produces object-level point clouds $\,hat{P^o} \\in \\mathbb{R}^{N^0\\times4}$ and scene-level point clouds $\,hat{P^s} \\in \\mathbb{R}^{N^s\\times4}$ under conditions $\\mathcal{C} = \\{T,B\\}$, with an object denoiser and a scene denoiser that interact through a scene controller. The OSA module aligns foreground features within semantic subspaces to reduce misalignment between foreground and background, improving object boundaries and overall scene fidelity, as evidenced by state-of-the-art Fréchet Point Cloud Distance $FPD$ and Jensen–Shannon Divergence $JSD$ on KITTI-360 and substantial gains in sparse-to-dense LiDAR completion and downstream 3D detection. Quantitatively, OLiDM achieves dramatic improvements in object-level fidelity (e.g., reduced Chamfer Distance and closer-to-real object counts) and enhances downstream detectors by about $2.7 ext{ extperthousand}$ in mAP over GT-Aug, validating its practical utility for perception pipelines. The framework supports versatile conditioning and partial-data scenarios, enabling robust conditional LiDAR generation for safety-focused autonomous driving research, with code available at the project page.

Abstract

To enhance autonomous driving safety in complex scenarios, various methods have been proposed to simulate LiDAR point cloud data. Nevertheless, these methods often face challenges in producing high-quality, diverse, and controllable foreground objects. To address the needs of object-aware tasks in 3D perception, we introduce OLiDM, a novel framework capable of generating high-fidelity LiDAR data at both the object and the scene levels. OLiDM consists of two pivotal components: the Object-Scene Progressive Generation (OPG) module and the Object Semantic Alignment (OSA) module. OPG adapts to user-specific prompts to generate desired foreground objects, which are subsequently employed as conditions in scene generation, ensuring controllable outputs at both the object and scene levels. This also facilitates the association of user-defined object-level annotations with the generated LiDAR scenes. Moreover, OSA aims to rectify the misalignment between foreground objects and background scenes, enhancing the overall quality of the generated objects. The broad effectiveness of OLiDM is demonstrated across various LiDAR generation tasks, as well as in 3D perception tasks. Specifically, on the KITTI-360 dataset, OLiDM surpasses prior state-of-the-art methods such as UltraLiDAR by 17.5 in FPD. Additionally, in sparse-to-dense LiDAR completion, OLiDM achieves a significant improvement over LiDARGen, with a 57.47\% increase in semantic IoU. Moreover, OLiDM enhances the performance of mainstream 3D detectors by 2.4\% in mAP and 1.9\% in NDS, underscoring its potential in advancing object-aware 3D tasks. Code is available at: https://yanty123.github.io/OLiDM.

OLiDM: Object-aware LiDAR Diffusion Models for Autonomous Driving

TL;DR

OLiDM addresses the challenge of generating realistic and controllable LiDAR data for autonomous driving by introducing Object-Scene Progressive Generation (OPG) and Object Semantic Alignment (OSA). It jointly produces object-level point clouds and scene-level point clouds under conditions , with an object denoiser and a scene denoiser that interact through a scene controller. The OSA module aligns foreground features within semantic subspaces to reduce misalignment between foreground and background, improving object boundaries and overall scene fidelity, as evidenced by state-of-the-art Fréchet Point Cloud Distance and Jensen–Shannon Divergence on KITTI-360 and substantial gains in sparse-to-dense LiDAR completion and downstream 3D detection. Quantitatively, OLiDM achieves dramatic improvements in object-level fidelity (e.g., reduced Chamfer Distance and closer-to-real object counts) and enhances downstream detectors by about in mAP over GT-Aug, validating its practical utility for perception pipelines. The framework supports versatile conditioning and partial-data scenarios, enabling robust conditional LiDAR generation for safety-focused autonomous driving research, with code available at the project page.

Abstract

To enhance autonomous driving safety in complex scenarios, various methods have been proposed to simulate LiDAR point cloud data. Nevertheless, these methods often face challenges in producing high-quality, diverse, and controllable foreground objects. To address the needs of object-aware tasks in 3D perception, we introduce OLiDM, a novel framework capable of generating high-fidelity LiDAR data at both the object and the scene levels. OLiDM consists of two pivotal components: the Object-Scene Progressive Generation (OPG) module and the Object Semantic Alignment (OSA) module. OPG adapts to user-specific prompts to generate desired foreground objects, which are subsequently employed as conditions in scene generation, ensuring controllable outputs at both the object and scene levels. This also facilitates the association of user-defined object-level annotations with the generated LiDAR scenes. Moreover, OSA aims to rectify the misalignment between foreground objects and background scenes, enhancing the overall quality of the generated objects. The broad effectiveness of OLiDM is demonstrated across various LiDAR generation tasks, as well as in 3D perception tasks. Specifically, on the KITTI-360 dataset, OLiDM surpasses prior state-of-the-art methods such as UltraLiDAR by 17.5 in FPD. Additionally, in sparse-to-dense LiDAR completion, OLiDM achieves a significant improvement over LiDARGen, with a 57.47\% increase in semantic IoU. Moreover, OLiDM enhances the performance of mainstream 3D detectors by 2.4\% in mAP and 1.9\% in NDS, underscoring its potential in advancing object-aware 3D tasks. Code is available at: https://yanty123.github.io/OLiDM.

Paper Structure

This paper contains 26 sections, 9 equations, 17 figures, 5 tables.

Figures (17)

  • Figure 1: To assess the quality of foreground objects, we utilize an off-the-shelf 3D detector (i.e., SECOND yan2018second trained on KITTI geiger2013kitti) to identify 3D objects in LiDAR data generated by various methods, including LiDARGen, UltraLiDAR and our OLiDM. (a) Visualization of some detected 3D objects in generated LiDAR scenes, where OLiDM enables the creation of LiDAR data with high-fidelity foreground objects. (b-c) We count the number and evaluate the quality of the detected objects, revealing our generated LiDAR data presents a similar distribution compared to the real data in KITTI-360 liao2022kitti360. In contrast, LiDARGen and UltraLiDAR produce significantly fewer foreground objects with lower quality. (d) Foreground object points represent a minimal fraction of the total scene points, highlighting the challenges of generating high-quality foreground objects. Please zoom in for detailed visualization.
  • Figure 2: Pipeline of OLiDM.OLiDM is designed to generate diverse, controllable, and realistic LiDAR point clouds at both object and scene levels through the Object-Scene Progressive Generation (OPG) process. Object Generation: OPG carefully combines conditions such as text descriptions and 3D geometric context to accurately model LiDAR objects. Scene Generation: OPG then incorporates these generated objects as specific conditions during scene-level generation, supported by a scene controller and an object semantic alignment module.
  • Figure 3: The Object Semantic Alignment (OSA) module aligns object features based on their semantic space, enhancing the foreground object generation and contributing to the overall quality of the generated LiDAR scenes.
  • Figure 4: Qualitative comparison against baselines on LiDAR generation. We compare with LiDARGen, UltraLiDAR and include real LiDAR for reference. OLiDM generates LiDAR data with more realistic sparsity and beam patterns. Red, Blue and Green boxes are the detected objects (car, cyclist and pedestrian) from a 3D detector trained on KITTI.
  • Figure 5: Sparse-to-dense Completion. The semantic results are predicted by RangeNet-53 milioto2019rangenet++.
  • ...and 12 more figures