Table of Contents
Fetching ...

Perspective-Invariant 3D Object Detection

Ao Liang, Lingdong Kong, Dongyue Lu, Youquan Liu, Jian Fang, Huaici Zhao, Wei Tsang Ooi

TL;DR

This work presents Pi3DET, the first multi-platform LiDAR 3D detection benchmark spanning vehicle, drone, and quadruped platforms, paired with Pi3DET-Net, a two-stage cross-platform adaptation framework that jointly learns geometry robustness and feature alignment to achieve perspective-invariant 3D detection. The approach introduces Random Platform Jitter and Virtual Platform Pose for geometric alignment, plus Geometry-Aware Transformation Descriptor and KL Probabilistic Feature Alignment for semantic feature alignment, enabling effective knowledge transfer from vehicle data to non-vehicle platforms. Extensive experiments on cross-platform and cross-dataset tasks, including cross-platform benchmarks across 18 detectors, demonstrate substantial improvements over baselines and strong generalization capabilities. The dataset, toolkit, and benchmark are publicly released to foster development of generalizable 3D perception systems across diverse autonomous platforms.

Abstract

With the rise of robotics, LiDAR-based 3D object detection has garnered significant attention in both academia and industry. However, existing datasets and methods predominantly focus on vehicle-mounted platforms, leaving other autonomous platforms underexplored. To bridge this gap, we introduce Pi3DET, the first benchmark featuring LiDAR data and 3D bounding box annotations collected from multiple platforms: vehicle, quadruped, and drone, thereby facilitating research in 3D object detection for non-vehicle platforms as well as cross-platform 3D detection. Based on Pi3DET, we propose a novel cross-platform adaptation framework that transfers knowledge from the well-studied vehicle platform to other platforms. This framework achieves perspective-invariant 3D detection through robust alignment at both geometric and feature levels. Additionally, we establish a benchmark to evaluate the resilience and robustness of current 3D detectors in cross-platform scenarios, providing valuable insights for developing adaptive 3D perception systems. Extensive experiments validate the effectiveness of our approach on challenging cross-platform tasks, demonstrating substantial gains over existing adaptation methods. We hope this work paves the way for generalizable and unified 3D perception systems across diverse and complex environments. Our Pi3DET dataset, cross-platform benchmark suite, and annotation toolkit have been made publicly available.

Perspective-Invariant 3D Object Detection

TL;DR

This work presents Pi3DET, the first multi-platform LiDAR 3D detection benchmark spanning vehicle, drone, and quadruped platforms, paired with Pi3DET-Net, a two-stage cross-platform adaptation framework that jointly learns geometry robustness and feature alignment to achieve perspective-invariant 3D detection. The approach introduces Random Platform Jitter and Virtual Platform Pose for geometric alignment, plus Geometry-Aware Transformation Descriptor and KL Probabilistic Feature Alignment for semantic feature alignment, enabling effective knowledge transfer from vehicle data to non-vehicle platforms. Extensive experiments on cross-platform and cross-dataset tasks, including cross-platform benchmarks across 18 detectors, demonstrate substantial improvements over baselines and strong generalization capabilities. The dataset, toolkit, and benchmark are publicly released to foster development of generalizable 3D perception systems across diverse autonomous platforms.

Abstract

With the rise of robotics, LiDAR-based 3D object detection has garnered significant attention in both academia and industry. However, existing datasets and methods predominantly focus on vehicle-mounted platforms, leaving other autonomous platforms underexplored. To bridge this gap, we introduce Pi3DET, the first benchmark featuring LiDAR data and 3D bounding box annotations collected from multiple platforms: vehicle, quadruped, and drone, thereby facilitating research in 3D object detection for non-vehicle platforms as well as cross-platform 3D detection. Based on Pi3DET, we propose a novel cross-platform adaptation framework that transfers knowledge from the well-studied vehicle platform to other platforms. This framework achieves perspective-invariant 3D detection through robust alignment at both geometric and feature levels. Additionally, we establish a benchmark to evaluate the resilience and robustness of current 3D detectors in cross-platform scenarios, providing valuable insights for developing adaptive 3D perception systems. Extensive experiments validate the effectiveness of our approach on challenging cross-platform tasks, demonstrating substantial gains over existing adaptation methods. We hope this work paves the way for generalizable and unified 3D perception systems across diverse and complex environments. Our Pi3DET dataset, cross-platform benchmark suite, and annotation toolkit have been made publicly available.

Paper Structure

This paper contains 50 sections, 9 equations, 18 figures, 22 tables.

Figures (18)

  • Figure 1: Motivation of Perspective invariant 3D object DETection (Pi3DET). We focus the practical yet challenging task of 3D object detection from heterogeneous robot platforms: Vehicle, Drone, and Quadruped. To achieve strong generalization, we contribute: 1) The first dataset for multi-platform 3D detection, comprising more than $\mathbf{51}$K LiDAR frames with over $\mathbf{250k}$ meticulously annotated 3D bounding boxes; 2) An adaptation framework, effectively transfers capabilities from vehicles to other platforms by integrating geometric and feature-level representations; 3) A comprehensive benchmark study of state-of-the-art 3D detectors on cross-platform scenarios.
  • Figure 2: Analysis of perspective differences across three robot platforms. We present the statistics of point elevation distribution (upper-left), ego motion distribution (bottom-left), and target bounding box distribution (right), along with means and variances for each platform's data. We use different colors to denote different platforms for simplicity, i.e., Vehicle, Drone, and Quadruped. Best viewed in colors.
  • Figure 3: Framework Overview. The proposed Pi3DET-Net consists of two main stages: Pre-Adaption (PA) and Knowledge-Adaption (KA), aiming at bridging the gap across heterogeneous robot platforms through alignment at both geometric (Section \ref{['sec:geometry alignment']}) and feature levels (Section \ref{['sec:feature alignment']}). On the geometric side, PA employs Random Platform Jitter to enhance robustness against ego-motion variations, while KA uses Virtual Platform Pose to simulate source-like viewpoints to achieve bidirectional geometric alignment across platforms. On the feature side, Pi3DET-Net further incorporates KL Probabilistic Feature Alignment to align target features with the source space, along with a Geometry-Aware Transformation Descriptor to correct global transformations across platforms.
  • Figure 4: Model Pre-Training Interface: This interface enables the pre-training of various 3D detection models to generate initial pseudo labels for subsequent processing.
  • Figure 5: Pseudo-Label Filtering Interface: In this view, 3D bounding boxes are projected onto corresponding RGB images, facilitating efficient and convenient filtering of pseudo labels.
  • ...and 13 more figures