Table of Contents
Fetching ...

Range-Edit: Semantic Mask Guided Outdoor LiDAR Scene Editing

Suchetan G. Uppur, Hemant Kumar, Vaibhav Kumar

TL;DR

This work introduces Range-Edit, a diffusion-based framework for editing real LiDAR scans via semantic mask conditioning in range-view space. By projecting point clouds to a range image, applying convex-hull semantic masks, and conditioning a Latent Diffusion Model on the masked input and interpolated mask, the method generates edited LiDAR scenes that are geometrically consistent and realistic, then maps them back to 3D via inverse range projection. The model objective $L_{LDM}$ is tailored to focus RMSE within the masked regions, enabling precise object-level edits and edge-case generation, validated on the KITTI-360 dataset with improvements from convex hull masking and region-focused loss. This approach offers a cost-effective, scalable path to augment LiDAR data for autonomous driving, enabling targeted edge-case and dynamic-scene generation without relying on full LiDAR simulations or handcrafted 3D models.

Abstract

Training autonomous driving and navigation systems requires large and diverse point cloud datasets that capture complex edge case scenarios from various dynamic urban settings. Acquiring such diverse scenarios from real-world point cloud data, especially for critical edge cases, is challenging, which restricts system generalization and robustness. Current methods rely on simulating point cloud data within handcrafted 3D virtual environments, which is time-consuming, computationally expensive, and often fails to fully capture the complexity of real-world scenes. To address some of these issues, this research proposes a novel approach that addresses the problem discussed by editing real-world LiDAR scans using semantic mask-based guidance to generate novel synthetic LiDAR point clouds. We incorporate range image projection and semantic mask conditioning to achieve diffusion-based generation. Point clouds are transformed to 2D range view images, which are used as an intermediate representation to enable semantic editing using convex hull-based semantic masks. These masks guide the generation process by providing information on the dimensions, orientations, and locations of objects in the real environment, ensuring geometric consistency and realism. This approach demonstrates high-quality LiDAR point cloud generation, capable of producing complex edge cases and dynamic scenes, as validated on the KITTI-360 dataset. This offers a cost-effective and scalable solution for generating diverse LiDAR data, a step toward improving the robustness of autonomous driving systems.

Range-Edit: Semantic Mask Guided Outdoor LiDAR Scene Editing

TL;DR

This work introduces Range-Edit, a diffusion-based framework for editing real LiDAR scans via semantic mask conditioning in range-view space. By projecting point clouds to a range image, applying convex-hull semantic masks, and conditioning a Latent Diffusion Model on the masked input and interpolated mask, the method generates edited LiDAR scenes that are geometrically consistent and realistic, then maps them back to 3D via inverse range projection. The model objective is tailored to focus RMSE within the masked regions, enabling precise object-level edits and edge-case generation, validated on the KITTI-360 dataset with improvements from convex hull masking and region-focused loss. This approach offers a cost-effective, scalable path to augment LiDAR data for autonomous driving, enabling targeted edge-case and dynamic-scene generation without relying on full LiDAR simulations or handcrafted 3D models.

Abstract

Training autonomous driving and navigation systems requires large and diverse point cloud datasets that capture complex edge case scenarios from various dynamic urban settings. Acquiring such diverse scenarios from real-world point cloud data, especially for critical edge cases, is challenging, which restricts system generalization and robustness. Current methods rely on simulating point cloud data within handcrafted 3D virtual environments, which is time-consuming, computationally expensive, and often fails to fully capture the complexity of real-world scenes. To address some of these issues, this research proposes a novel approach that addresses the problem discussed by editing real-world LiDAR scans using semantic mask-based guidance to generate novel synthetic LiDAR point clouds. We incorporate range image projection and semantic mask conditioning to achieve diffusion-based generation. Point clouds are transformed to 2D range view images, which are used as an intermediate representation to enable semantic editing using convex hull-based semantic masks. These masks guide the generation process by providing information on the dimensions, orientations, and locations of objects in the real environment, ensuring geometric consistency and realism. This approach demonstrates high-quality LiDAR point cloud generation, capable of producing complex edge cases and dynamic scenes, as validated on the KITTI-360 dataset. This offers a cost-effective and scalable solution for generating diverse LiDAR data, a step toward improving the robustness of autonomous driving systems.

Paper Structure

This paper contains 18 sections, 3 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: An Overview of the proposed method from the input point cloud to the generated output point cloud.
  • Figure 2: Range-View Projection
  • Figure 3: (a). Comparison between the mask made by the convex hull of the projected points and the mask of the projected points. (b). Mask formed from projected points. (c). Mask formed using a convex hull on the projected points.
  • Figure 4: Qualitative results showing two examples of generated range-view images. (a). A close-up view of the intensity channel of the range-view image, where fine details like rear-light and license plate intensities are generated. (b). A close-up view of the range channel of the range-view image, where the ray-drop effect due to the windshield and windows is successfully generated.
  • Figure 5: Qualitative results showing point cloud generation quality as BEV images on input masked point clouds. Masked point cloud (top row) vs generated point cloud (middle row) vs ground truth (bottom row).
  • ...and 4 more figures