Table of Contents
Fetching ...

One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection

Zhenyu Wang, Yali Li, Hengshuang Zhao, Shengjin Wang

TL;DR

The domain-aware partitioning in scatter and context, guided by a routing mechanism, to address the data interference issue, and further incorporate the text modality for a language-guided classification to unify the multi-dataset label spaces and mitigate the category interference issue are proposed.

Abstract

The current trend in computer vision is to utilize one universal model to address all various tasks. Achieving such a universal model inevitably requires incorporating multi-domain data for joint training to learn across multiple problem scenarios. In point cloud based 3D object detection, however, such multi-domain joint training is highly challenging, because large domain gaps among point clouds from different datasets lead to the severe domain-interference problem. In this paper, we propose \textbf{OneDet3D}, a universal one-for-all model that addresses 3D detection across different domains, including diverse indoor and outdoor scenes, within the \emph{same} framework and only \emph{one} set of parameters. We propose the domain-aware partitioning in scatter and context, guided by a routing mechanism, to address the data interference issue, and further incorporate the text modality for a language-guided classification to unify the multi-dataset label spaces and mitigate the category interference issue. The fully sparse structure and anchor-free head further accommodate point clouds with significant scale disparities. Extensive experiments demonstrate the strong universal ability of OneDet3D to utilize only one trained model for addressing almost all 3D object detection tasks.

One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection

TL;DR

The domain-aware partitioning in scatter and context, guided by a routing mechanism, to address the data interference issue, and further incorporate the text modality for a language-guided classification to unify the multi-dataset label spaces and mitigate the category interference issue are proposed.

Abstract

The current trend in computer vision is to utilize one universal model to address all various tasks. Achieving such a universal model inevitably requires incorporating multi-domain data for joint training to learn across multiple problem scenarios. In point cloud based 3D object detection, however, such multi-domain joint training is highly challenging, because large domain gaps among point clouds from different datasets lead to the severe domain-interference problem. In this paper, we propose \textbf{OneDet3D}, a universal one-for-all model that addresses 3D detection across different domains, including diverse indoor and outdoor scenes, within the \emph{same} framework and only \emph{one} set of parameters. We propose the domain-aware partitioning in scatter and context, guided by a routing mechanism, to address the data interference issue, and further incorporate the text modality for a language-guided classification to unify the multi-dataset label spaces and mitigate the category interference issue. The fully sparse structure and anchor-free head further accommodate point clouds with significant scale disparities. Extensive experiments demonstrate the strong universal ability of OneDet3D to utilize only one trained model for addressing almost all 3D object detection tasks.

Paper Structure

This paper contains 16 sections, 4 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: The high-level overview comparing the multi-domain joint training performance of 10 existing 3D detectors and our OneDet3D. These models are jointly training on the indoor datasets SUN RGB-D (SUN), ScanNet, and outdoor datasets KITTI, nuScenes (nuS). We also evaluate the cross-domain performance on the indoor S3DIS and outdoor Waymo datasets. The center of the circle means that the corresponding metric is less than 10%, and the outermost means 90%. Existing indoor detectors are plotted in red, outdoor detectors are in green, and detectors that aim for different scenes are in orange. Our model has the remarkable capacity to generalize across a wide range of diverse 3D scenes (a larger polygon area) with only one set of parameters and the same architecture.
  • Figure 2: Illustration of existing 3D detectors (a) and ours (b). Existing detectors can be divided into point-based (up) and voxel-based (down). Our model has the capacity for joint training on multi-domain point cloud data.
  • Figure 3: The overview of OneDet3D. It utilizes multi-domain point clouds for training. The domain-aware partitioning in scatter and context avoids the data-level interference issue, and the language-guided classification addresses the issue from category-level interference. Once trained, OneDet3D has the one-for-all ability to generalize to unseen domains, categories, and diverse scenes.
  • Figure 4: The visualized results of OneDet3D on the indoor SUN RGB-D, ScanNet and outdoor KITTI, nuScenes datasets separately.
  • Figure 5: The visualized results of OneDet3D on the indoor SUN RGB-D dataset.
  • ...and 3 more figures