Table of Contents
Fetching ...

Just Add $100 More: Augmenting NeRF-based Pseudo-LiDAR Point Cloud for Resolving Class-imbalance Problem

Mincheol Chang, Siyeong Lee, Jinkyu Kim, Namil Kim

TL;DR

This work proposes to leverage pseudo-LiDAR point clouds generated (at a low cost) from videos capturing a surround view of miniatures or real-world objects of minor classes and demonstrates the superiority and generality of the method through performance improvements in extensive experiments conducted on three popular benchmarks.

Abstract

Typical LiDAR-based 3D object detection models are trained in a supervised manner with real-world data collection, which is often imbalanced over classes (or long-tailed). To deal with it, augmenting minority-class examples by sampling ground truth (GT) LiDAR points from a database and pasting them into a scene of interest is often used, but challenges still remain: inflexibility in locating GT samples and limited sample diversity. In this work, we propose to leverage pseudo-LiDAR point clouds generated (at a low cost) from videos capturing a surround view of miniatures or real-world objects of minor classes. Our method, called Pseudo Ground Truth Augmentation (PGT-Aug), consists of three main steps: (i) volumetric 3D instance reconstruction using a 2D-to-3D view synthesis model, (ii) object-level domain alignment with LiDAR intensity estimation and (iii) a hybrid context-aware placement method from ground and map information. We demonstrate the superiority and generality of our method through performance improvements in extensive experiments conducted on three popular benchmarks, i.e., nuScenes, KITTI, and Lyft, especially for the datasets with large domain gaps captured by different LiDAR configurations. Our code and data will be publicly available upon publication.

Just Add $100 More: Augmenting NeRF-based Pseudo-LiDAR Point Cloud for Resolving Class-imbalance Problem

TL;DR

This work proposes to leverage pseudo-LiDAR point clouds generated (at a low cost) from videos capturing a surround view of miniatures or real-world objects of minor classes and demonstrates the superiority and generality of the method through performance improvements in extensive experiments conducted on three popular benchmarks.

Abstract

Typical LiDAR-based 3D object detection models are trained in a supervised manner with real-world data collection, which is often imbalanced over classes (or long-tailed). To deal with it, augmenting minority-class examples by sampling ground truth (GT) LiDAR points from a database and pasting them into a scene of interest is often used, but challenges still remain: inflexibility in locating GT samples and limited sample diversity. In this work, we propose to leverage pseudo-LiDAR point clouds generated (at a low cost) from videos capturing a surround view of miniatures or real-world objects of minor classes. Our method, called Pseudo Ground Truth Augmentation (PGT-Aug), consists of three main steps: (i) volumetric 3D instance reconstruction using a 2D-to-3D view synthesis model, (ii) object-level domain alignment with LiDAR intensity estimation and (iii) a hybrid context-aware placement method from ground and map information. We demonstrate the superiority and generality of our method through performance improvements in extensive experiments conducted on three popular benchmarks, i.e., nuScenes, KITTI, and Lyft, especially for the datasets with large domain gaps captured by different LiDAR configurations. Our code and data will be publicly available upon publication.
Paper Structure (15 sections, 4 equations, 12 figures, 11 tables, 1 algorithm)

This paper contains 15 sections, 4 equations, 12 figures, 11 tables, 1 algorithm.

Figures (12)

  • Figure 1: (a) Original nuScenes class distribution, which becomes more balanced by augmenting our generated pseudo-LiDAR samples for target minor classes. (b) Our proposed PGT-Aug reconstructs objects' 3D volumetric representation from multi-view images of miniatures and real-world objects of minor classes, followed by NeRF-based pseudo-LiDAR generation (see top). (c) These are then used to augment samples into a new scene to train LiDAR-based object detectors (see bottom).
  • Figure 2: Overview of Pseudo GT (PGT)-Aug framework. Given (a) surround-view images of miniatures or public videos of minority-class objects, (b) we first reconstruct their volumetric representations by estimating camera poses and foreground extraction, followed by 2D-to-3D rendering to obtain RGB-based objects' point clouds (\ref{['sec:volumetric3d']}). (c) We post-process RGB-based point clouds using spatial points rearrangement and CycleGAN-based intensity estimator (\ref{['sec:viewdependent']}), producing (d) view-dependent pseudo LiDAR point clouds. Such points are then stored in a bank with ground truth (GT) samples, and then (e) we paste sampled objects into the target scene based on hybrid information from the ground and map (\ref{['sec:augmentation']}).
  • Figure 3: Comparisons of View-Dependent and View-Agnostic Rendering. By maintaining a consistent number of colored points through view-agnostic rendering, we can create a view-agnostic bounding box for the given object.
  • Figure 4: Effect of Rearranged Range Projection on Different Benchmarks.
  • Figure 5: Mixing ratio between GT and PGT objects
  • ...and 7 more figures