Range-Edit: Semantic Mask Guided Outdoor LiDAR Scene Editing

Suchetan G. Uppur; Hemant Kumar; Vaibhav Kumar

Range-Edit: Semantic Mask Guided Outdoor LiDAR Scene Editing

Suchetan G. Uppur, Hemant Kumar, Vaibhav Kumar

TL;DR

This work introduces Range-Edit, a diffusion-based framework for editing real LiDAR scans via semantic mask conditioning in range-view space. By projecting point clouds to a range image, applying convex-hull semantic masks, and conditioning a Latent Diffusion Model on the masked input and interpolated mask, the method generates edited LiDAR scenes that are geometrically consistent and realistic, then maps them back to 3D via inverse range projection. The model objective $L_{LDM}$ is tailored to focus RMSE within the masked regions, enabling precise object-level edits and edge-case generation, validated on the KITTI-360 dataset with improvements from convex hull masking and region-focused loss. This approach offers a cost-effective, scalable path to augment LiDAR data for autonomous driving, enabling targeted edge-case and dynamic-scene generation without relying on full LiDAR simulations or handcrafted 3D models.

Abstract

Training autonomous driving and navigation systems requires large and diverse point cloud datasets that capture complex edge case scenarios from various dynamic urban settings. Acquiring such diverse scenarios from real-world point cloud data, especially for critical edge cases, is challenging, which restricts system generalization and robustness. Current methods rely on simulating point cloud data within handcrafted 3D virtual environments, which is time-consuming, computationally expensive, and often fails to fully capture the complexity of real-world scenes. To address some of these issues, this research proposes a novel approach that addresses the problem discussed by editing real-world LiDAR scans using semantic mask-based guidance to generate novel synthetic LiDAR point clouds. We incorporate range image projection and semantic mask conditioning to achieve diffusion-based generation. Point clouds are transformed to 2D range view images, which are used as an intermediate representation to enable semantic editing using convex hull-based semantic masks. These masks guide the generation process by providing information on the dimensions, orientations, and locations of objects in the real environment, ensuring geometric consistency and realism. This approach demonstrates high-quality LiDAR point cloud generation, capable of producing complex edge cases and dynamic scenes, as validated on the KITTI-360 dataset. This offers a cost-effective and scalable solution for generating diverse LiDAR data, a step toward improving the robustness of autonomous driving systems.

Range-Edit: Semantic Mask Guided Outdoor LiDAR Scene Editing

TL;DR

Abstract

Range-Edit: Semantic Mask Guided Outdoor LiDAR Scene Editing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)