Table of Contents
Fetching ...

Learning to Detect Objects from Multi-Agent LiDAR Scans without Manual Labels

Qiming Xia, Wenkai Lin, Haoen Xiang, Xun Huang, Siheng Chen, Zhen Dong, Cheng Wang, Chenglu Wen

TL;DR

A novel unsupervised method that learns to Detect Objects from Multi-Agent LiDAR scans, termed DOtA, without using labels from external, which outperforms state-of-the-art unsupervised 3D object detection methods and validate the effectiveness of the DOtA labels under various collaborative perception frameworks.

Abstract

Unsupervised 3D object detection serves as an important solution for offline 3D object annotation. However, due to the data sparsity and limited views, the clustering-based label fitting in unsupervised object detection often generates low-quality pseudo-labels. Multi-agent collaborative dataset, which involves the sharing of complementary observations among agents, holds the potential to break through this bottleneck. In this paper, we introduce a novel unsupervised method that learns to Detect Objects from Multi-Agent LiDAR scans, termed DOtA, without using labels from external. DOtA first uses the internally shared ego-pose and ego-shape of collaborative agents to initialize the detector, leveraging the generalization performance of neural networks to infer preliminary labels. Subsequently,DOtA uses the complementary observations between agents to perform multi-scale encoding on preliminary labels, then decodes high-quality and low-quality labels. These labels are further used as prompts to guide a correct feature learning process, thereby enhancing the performance of the unsupervised object detection task. Extensive experiments on the V2V4Real and OPV2V datasets show that our DOtA outperforms state-of-the-art unsupervised 3D object detection methods. Additionally, we also validate the effectiveness of the DOtA labels under various collaborative perception frameworks.The code is available at https://github.com/xmuqimingxia/DOtA.

Learning to Detect Objects from Multi-Agent LiDAR Scans without Manual Labels

TL;DR

A novel unsupervised method that learns to Detect Objects from Multi-Agent LiDAR scans, termed DOtA, without using labels from external, which outperforms state-of-the-art unsupervised 3D object detection methods and validate the effectiveness of the DOtA labels under various collaborative perception frameworks.

Abstract

Unsupervised 3D object detection serves as an important solution for offline 3D object annotation. However, due to the data sparsity and limited views, the clustering-based label fitting in unsupervised object detection often generates low-quality pseudo-labels. Multi-agent collaborative dataset, which involves the sharing of complementary observations among agents, holds the potential to break through this bottleneck. In this paper, we introduce a novel unsupervised method that learns to Detect Objects from Multi-Agent LiDAR scans, termed DOtA, without using labels from external. DOtA first uses the internally shared ego-pose and ego-shape of collaborative agents to initialize the detector, leveraging the generalization performance of neural networks to infer preliminary labels. Subsequently,DOtA uses the complementary observations between agents to perform multi-scale encoding on preliminary labels, then decodes high-quality and low-quality labels. These labels are further used as prompts to guide a correct feature learning process, thereby enhancing the performance of the unsupervised object detection task. Extensive experiments on the V2V4Real and OPV2V datasets show that our DOtA outperforms state-of-the-art unsupervised 3D object detection methods. Additionally, we also validate the effectiveness of the DOtA labels under various collaborative perception frameworks.The code is available at https://github.com/xmuqimingxia/DOtA.

Paper Structure

This paper contains 17 sections, 7 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Comparison of the performance of various methods under multi-view synchronous observation. (a) On the left, Our DOtA achieves best performance on the real-world collaborative V2V4Real v2v4real dataset (more details are in Tab. \ref{['tab:com_unsupervised']}). (b) On the right, we adhere to the localization noise parameters established in Where2commwhere2comm, and conducted experiments on the simulation dataset OPV2V opv2v to assess the robustness of our approach against realistic localization noise. DOtA is more robust to the localization noise than previous SOTAs.
  • Figure 2: (a) A toy example demonstrating a moving vehicle under multi-agent observation. (b) The instance structure is incomplete under single-view observation in the current frame; (c) Historical frame from a single-view cannot complete missing information; (d) Multi-view observation for a moving vehicle.
  • Figure 3: The overview of proposed DOtA. (a) The initial detector, pre-trained with shared information, infer preliminary labels. (b) Multi-scale transformations are utilized to encode contextual information for preliminary labels, with the discriminator $\mathcal{D}$ integrating the encoded information from various agents to distinguish between high-quality and low-quality labels. (c) Distinguishable labels serve as prompts, and Label-Internal Contrastive Learning (LICL) is leveraged to guide the learning of correct features while suppressing the learning of erroneous ones.
  • Figure 4: The IoU distribution between pseudo-labels and ground truth is presented, where figures (a), (b), and (c) correspond to OPV2V, and figures (d), (e), and (f) correspond to V2V4Real.
  • Figure 5: Visualization of the label filtering process of DOtA on the OPV2V $train$ split.