Table of Contents
Fetching ...

DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception

Kai Jiang, Jiaxing Huang, Weiying Xie, Yunsong Li, Ling Shao, Shijian Lu

TL;DR

DA-BEV is designed, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features and introduces the idea of query into the domain adaptation framework to derive useful information from image-view and BEV features.

Abstract

Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. However, most existing studies were conducted under a supervised setup which cannot scale well while handling various new data. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. In this work, we design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features. DA-BEV introduces the idea of query into the domain adaptation framework to derive useful information from image-view and BEV features. It consists of two query-based designs, namely, query-based adversarial learning (QAL) and query-based self-training (QST), which exploits image-view features or BEV features to regularize the adaptation of the other. Extensive experiments show that DA-BEV achieves superior domain adaptive BEV perception performance consistently across multiple datasets and tasks such as 3D object detection and 3D scene segmentation.

DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception

TL;DR

DA-BEV is designed, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features and introduces the idea of query into the domain adaptation framework to derive useful information from image-view and BEV features.

Abstract

Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. However, most existing studies were conducted under a supervised setup which cannot scale well while handling various new data. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. In this work, we design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features. DA-BEV introduces the idea of query into the domain adaptation framework to derive useful information from image-view and BEV features. It consists of two query-based designs, namely, query-based adversarial learning (QAL) and query-based self-training (QST), which exploits image-view features or BEV features to regularize the adaptation of the other. Extensive experiments show that DA-BEV achieves superior domain adaptive BEV perception performance consistently across multiple datasets and tasks such as 3D object detection and 3D scene segmentation.
Paper Structure (17 sections, 15 equations, 4 figures, 7 tables)

This paper contains 17 sections, 15 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Domain adaptive bird’s eye view perception (DA-BEV). The first two rows show the detection by a source-only model (trained with source data with no adaptation) over a source scene and a target scene, respectively. The yellow 3D boxes indicate correct detection while the red dotted boxes highlight false-positive and false-negative detection. The third row shows the detection by our DA-BEV on the same target scene. The two columns visualize the 3D predictions in multi-camera view and bird’s eye view, respectively.
  • Figure 2: The architecture of camera-only BEV models. Boxes in dash lines denote model inputs including multi-camera images and camera configurations. Boxes in solid lines stand for encoding/decoding processes and intermediate representations.
  • Figure 3: The overall framework of DA-BEV, which exploits the complementary nature of image-view and BEV features for unsupervised BEV perception adaptation. To this end, DA-BEV first introduces an additional 2D Image-View Decoder into the BEV perception model to capture image-view features with local 2D information, which complement BEV features that capture rich global 3D information. The training of DA-BEV comprises two designs including query-based adversarial learning (QAL) and query-based self-training (QST). The former exploits the complementary information from image-view or BEV features to regularize the adversarial learning of the another, while the latter exploits the complementary information from both image-view and BEV features to regularize their self-training. Note, all the auxiliary supervision flows are utilized during network training but discarded after adaptation, which introduces slight computation overhead during model training but has little effect on testing.
  • Figure 4: Qualitative illustration of DA-BEV on 3D object detection for cross-weather domain adaptation (i.e., Clear Weather $\rightarrow$ Rainy Weather).