Table of Contents
Fetching ...

Paved2Paradise: Cost-Effective and Scalable LiDAR Simulation by Factoring the Real World

Michael A. Alcorn, Noah Schwartz

TL;DR

This work tackles the high cost of obtaining diverse, labeled LiDAR data by introducing Paved2Paradise (P2P), a pipeline that factors real-world scenes into separate background and object datasets to generate a combinatorially large synthetic, fully annotated lidar corpus. It levels ground planes, places objects with perspective-consistent transformations, merges them with backgrounds, and simulates realistic occlusion and sensor effects, all with minimal manual annotation. In experiments on human detection in orchards and pedestrian detection in urban scenes, models trained solely on P2P data demonstrate strong performance in occluded scenarios and achieve competitive, and sometimes comparable, results to baselines trained on real data. The approach offers a scalable, cost-effective path to accelerate 3D perception development across sectors where lidar data are expensive to collect, with potential extensions to richer sensor models and weather effects.

Abstract

To achieve strong real world performance, neural networks must be trained on large, diverse datasets; however, obtaining and annotating such datasets is costly and time-consuming, particularly for 3D point clouds. In this paper, we describe Paved2Paradise, a simple, cost-effective approach for generating fully labeled, diverse, and realistic lidar datasets from scratch, all while requiring minimal human annotation. Our key insight is that, by deliberately collecting separate "background" and "object" datasets (i.e., "factoring the real world"), we can intelligently combine them to produce a combinatorially large and diverse training set. The Paved2Paradise pipeline thus consists of four steps: (1) collecting copious background data, (2) recording individuals from the desired object class(es) performing different behaviors in an isolated environment (like a parking lot), (3) bootstrapping labels for the object dataset, and (4) generating samples by placing objects at arbitrary locations in backgrounds. To demonstrate the utility of Paved2Paradise, we generated synthetic datasets for two tasks: (1) human detection in orchards (a task for which no public data exists) and (2) pedestrian detection in urban environments. Qualitatively, we find that a model trained exclusively on Paved2Paradise synthetic data is highly effective at detecting humans in orchards, including when individuals are heavily occluded by tree branches. Quantitatively, a model trained on Paved2Paradise data that sources backgrounds from KITTI performs comparably to a model trained on the actual dataset. These results suggest the Paved2Paradise synthetic data pipeline can help accelerate point cloud model development in sectors where acquiring lidar datasets has previously been cost-prohibitive.

Paved2Paradise: Cost-Effective and Scalable LiDAR Simulation by Factoring the Real World

TL;DR

This work tackles the high cost of obtaining diverse, labeled LiDAR data by introducing Paved2Paradise (P2P), a pipeline that factors real-world scenes into separate background and object datasets to generate a combinatorially large synthetic, fully annotated lidar corpus. It levels ground planes, places objects with perspective-consistent transformations, merges them with backgrounds, and simulates realistic occlusion and sensor effects, all with minimal manual annotation. In experiments on human detection in orchards and pedestrian detection in urban scenes, models trained solely on P2P data demonstrate strong performance in occluded scenarios and achieve competitive, and sometimes comparable, results to baselines trained on real data. The approach offers a scalable, cost-effective path to accelerate 3D perception development across sectors where lidar data are expensive to collect, with potential extensions to richer sensor models and weather effects.

Abstract

To achieve strong real world performance, neural networks must be trained on large, diverse datasets; however, obtaining and annotating such datasets is costly and time-consuming, particularly for 3D point clouds. In this paper, we describe Paved2Paradise, a simple, cost-effective approach for generating fully labeled, diverse, and realistic lidar datasets from scratch, all while requiring minimal human annotation. Our key insight is that, by deliberately collecting separate "background" and "object" datasets (i.e., "factoring the real world"), we can intelligently combine them to produce a combinatorially large and diverse training set. The Paved2Paradise pipeline thus consists of four steps: (1) collecting copious background data, (2) recording individuals from the desired object class(es) performing different behaviors in an isolated environment (like a parking lot), (3) bootstrapping labels for the object dataset, and (4) generating samples by placing objects at arbitrary locations in backgrounds. To demonstrate the utility of Paved2Paradise, we generated synthetic datasets for two tasks: (1) human detection in orchards (a task for which no public data exists) and (2) pedestrian detection in urban environments. Qualitatively, we find that a model trained exclusively on Paved2Paradise synthetic data is highly effective at detecting humans in orchards, including when individuals are heavily occluded by tree branches. Quantitatively, a model trained on Paved2Paradise data that sources backgrounds from KITTI performs comparably to a model trained on the actual dataset. These results suggest the Paved2Paradise synthetic data pipeline can help accelerate point cloud model development in sectors where acquiring lidar datasets has previously been cost-prohibitive.
Paper Structure (17 sections, 9 equations, 8 figures, 1 table)

This paper contains 17 sections, 9 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: An overview of the Paved2Paradise pipeline for generating synthetic scenes (left) and an example synthetic scene (right). Left: The pipeline begins by randomly sampling an object scene (top left) and a background scene (bottom right) and "leveling" both scenes (Section \ref{['sec:level']}). The leveling step ensures the object point cloud extracted from the object scene will be placed on the ground in the background scene. Next, the object point cloud is extracted from the object scene and placed at a randomly sampled location in the detection region (Section \ref{['sec:place']}). During the placement step, the object point cloud is rotated to ensure perspective consistency relative to the sensor's location. The repositioned object point cloud is then combined with the background point cloud (Section \ref{['sec:combine']}), and a final occluding procedure removes points from the scene based on their visibility from the sensor's perspective (Section \ref{['sec:occlude']}). Right: The resulting synthetic scene is both realistic and automatically annotated.
  • Figure 2: Because this sensor's $xy$-plane was not level with the ground plane, naively rotating the human point cloud around the sensor's $z$-axis results in the human's feet being underground.
  • Figure 3: The Paved2Paradise leveling procedure estimates the ground plane for a scene (top) by performing linear regression on ground points (middle) obtained by finding the nearest neighbors to a set of grid points beneath the scene (bottom).
  • Figure 4: Naively placing a point cloud at an arbitrary location in a scene can produce a synthetic lidar scan that is physically impossible. Top Left: In the original scene, the cone is directly in front of the sensor (indicated by the axes) with the cone's base perpendicular to the sensor's $x$-axis. Top Right: Physically moving the cone to the left of the sensor (without rotating the object) results in an entirely different lidar scan. Bottom Left: However, naively translating the point cloud from the original scene to the left of the sensor leads to an impossible lidar scan. Bottom Right: Rotating the translated point cloud produces a perspective-consistent synthetic lidar scan.
  • Figure 5: The Paved2Paradise object occluding procedure. Given a set of object points $\widehat{\mathbf{O}}$ (here, $\widehat{\mathbf{O}}$ contains a single point, i.e., $\widehat{\mathbf{O}} = \{\hat{\mathbf{o}}_{1}\}$), Paved2Paradise extracts a subset of (reindexed; see Section \ref{['sec:occlude']}) background points $\mathbf{B}_{\beta} \subset \mathbf{B}$ such that each $\mathbf{b}_{j} \in \mathbf{B}_{\beta}$ is both in the same sector as $\widehat{\mathbf{O}}$ (delineated by the gray cone) and closer to the sensor (the blue cylinder) than at least one point in $\widehat{\mathbf{O}}$. Next, for each object point $\hat{\mathbf{o}}_{i}$, Paved2Paradise calculates the minimum distances $d_{i,j}$ between the points in $\mathbf{B}_{\beta}$ and the ray $r(s, \hat{\mathbf{o}}_{i}) = s \frac{\hat{\mathbf{o}}_{i}}{\lVert \hat{\mathbf{o}}_{i} \rVert}$. Because the closest point along $r(s, \hat{\mathbf{o}}_{i})$ to $\mathbf{b}_{j}$ is the projection of $\mathbf{b}_{j}$ onto the ray, i.e., $(\mathbf{b}_{j} \cdot \frac{\hat{\mathbf{o}}_{i}}{\lVert \hat{\mathbf{o}}_{i} \rVert}) \frac{\hat{\mathbf{o}}_{i}}{\lVert \hat{\mathbf{o}}_{i} \rVert}$, each $d_{i,j}$ can be calculated using the Pythagorean theorem. Finally, if any of the $d_{i,j}$ values are less than some user-defined threshold, then $\hat{\mathbf{o}}_{i}$ is considered "occluded" and dropped from the final point cloud for the scene.
  • ...and 3 more figures