Table of Contents
Fetching ...

LiDAR-EDIT: LiDAR Data Generation by Editing the Object Layouts in Real-World Scenes

Shing-Hei Ho, Bao Thach, Minghan Zhu

TL;DR

LiDAR-EDIT addresses generating controllable, realistic synthetic LiDAR data by editing object layouts within real-world scans while preserving the background. Formally, the target scan is $P_T = f(S, D_T)$ given a background $S$ and edited foreground $D_T$, decomposed into object removal, point-cloud completion, and object insertion. The method introduces spherical voxelization with coordinates $(r, theta, phi)$ to mirror LiDAR rays and enforce consistent occlusion and density, plus a two-stage background inpainting pipeline inspired by UltraLidar. An object library of full-shape point clouds is constructed and completion is performed with AnchorFormer to enable arbitrary insertions. Experiments on nuScenes-LidarSeg demonstrate realistic edits with small domain gaps and show synthetic pretraining can improve downstream detection, highlighting practical impact for autonomous driving.

Abstract

We present LiDAR-EDIT, a novel paradigm for generating synthetic LiDAR data for autonomous driving. Our framework edits real-world LiDAR scans by introducing new object layouts while preserving the realism of the background environment. Compared to end-to-end frameworks that generate LiDAR point clouds from scratch, LiDAR-EDIT offers users full control over the object layout, including the number, type, and pose of objects, while keeping most of the original real-world background. Our method also provides object labels for the generated data. Compared to novel view synthesis techniques, our framework allows for the creation of counterfactual scenarios with object layouts significantly different from the original real-world scene. LiDAR-EDIT uses spherical voxelization to enforce correct LiDAR projective geometry in the generated point clouds by construction. During object removal and insertion, generative models are employed to fill the unseen background and object parts that were occluded in the original real LiDAR scans. Experimental results demonstrate that our framework produces realistic LiDAR scans with practical value for downstream tasks.

LiDAR-EDIT: LiDAR Data Generation by Editing the Object Layouts in Real-World Scenes

TL;DR

LiDAR-EDIT addresses generating controllable, realistic synthetic LiDAR data by editing object layouts within real-world scans while preserving the background. Formally, the target scan is given a background and edited foreground , decomposed into object removal, point-cloud completion, and object insertion. The method introduces spherical voxelization with coordinates to mirror LiDAR rays and enforce consistent occlusion and density, plus a two-stage background inpainting pipeline inspired by UltraLidar. An object library of full-shape point clouds is constructed and completion is performed with AnchorFormer to enable arbitrary insertions. Experiments on nuScenes-LidarSeg demonstrate realistic edits with small domain gaps and show synthetic pretraining can improve downstream detection, highlighting practical impact for autonomous driving.

Abstract

We present LiDAR-EDIT, a novel paradigm for generating synthetic LiDAR data for autonomous driving. Our framework edits real-world LiDAR scans by introducing new object layouts while preserving the realism of the background environment. Compared to end-to-end frameworks that generate LiDAR point clouds from scratch, LiDAR-EDIT offers users full control over the object layout, including the number, type, and pose of objects, while keeping most of the original real-world background. Our method also provides object labels for the generated data. Compared to novel view synthesis techniques, our framework allows for the creation of counterfactual scenarios with object layouts significantly different from the original real-world scene. LiDAR-EDIT uses spherical voxelization to enforce correct LiDAR projective geometry in the generated point clouds by construction. During object removal and insertion, generative models are employed to fill the unseen background and object parts that were occluded in the original real LiDAR scans. Experimental results demonstrate that our framework produces realistic LiDAR scans with practical value for downstream tasks.

Paper Structure

This paper contains 23 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overview of our novel LiDAR editing paradigm. Given the point cloud of a real-world LiDAR scan, we want to freely change the objects and their poses while preserving the background environment. This requires filling the background when objects are removed, and handling occlusion and LiDAR scan projection when new objects are inserted. Edited points are highlighted in red.
  • Figure 2: Overview of our novel $\text{LiDAR-EDIT}$ framework for LiDAR editing. Asterisk signs denote modules with generative models.
  • Figure 3: (Left) Spherical voxelization discretizes the space based on radius $r$ (distance from the origin), azimuth $\theta$ (horizontal angle), and elevation $\phi$ (vertical angle). (Right) Occlusion handling in spherical representation is straightforward. If a voxel at coordinate $(r, \theta, \phi)$ is occupied (Green), all voxels with the same azimuth and elevation but a larger radius ($(r', \theta, \phi)$ where $r' > r$) will be occluded (Red).
  • Figure 4: Overview of the background inpainting model. There are two stages in the training. (a) Use a VQ-VAE model to learn a discrete latent map in the bird's eye view. The colors represent the discrete latent codes. (b) Learn a multi-step autoregressive generation model that fills the masked tokens in the latent map and decode it to a full point cloud. Inpainted points are marked red.
  • Figure 5: Illustration of the inpainting mask creation process in the background inpainting experiment. (a) shows the object-free sectors in the original scan. (b) shows that we use a nominal bounding box of the average size at 10 meters distance to create the mask. The bounding box can be rotated to fit in an object-free sector. (c) shows an example inpainting mask created from a rotated bounding box.
  • ...and 2 more figures