Table of Contents
Fetching ...

Towards Full-scene Domain Generalization in Multi-agent Collaborative Bird's Eye View Segmentation for Connected and Autonomous Driving

Senkang Hu, Zhengru Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang, Sam Kwong

TL;DR

This work proposes a unified domain generalization framework to be utilized during the training and inference stages of collaborative perception, and introduces an Amplitude Augmentation method to augment low-frequency image variations, broadening the model's ability to learn across multiple domains.

Abstract

Collaborative perception has recently gained significant attention in autonomous driving, improving perception quality by enabling the exchange of additional information among vehicles. However, deploying collaborative perception systems can lead to domain shifts due to diverse environmental conditions and data heterogeneity among connected and autonomous vehicles (CAVs). To address these challenges, we propose a unified domain generalization framework to be utilized during the training and inference stages of collaborative perception. In the training phase, we introduce an Amplitude Augmentation (AmpAug) method to augment low-frequency image variations, broadening the model's ability to learn across multiple domains. We also employ a meta-consistency training scheme to simulate domain shifts, optimizing the model with a carefully designed consistency loss to acquire domain-invariant representations. In the inference phase, we introduce an intra-system domain alignment mechanism to reduce or potentially eliminate the domain discrepancy among CAVs prior to inference. Extensive experiments substantiate the effectiveness of our method in comparison with the existing state-of-the-art works.

Towards Full-scene Domain Generalization in Multi-agent Collaborative Bird's Eye View Segmentation for Connected and Autonomous Driving

TL;DR

This work proposes a unified domain generalization framework to be utilized during the training and inference stages of collaborative perception, and introduces an Amplitude Augmentation method to augment low-frequency image variations, broadening the model's ability to learn across multiple domains.

Abstract

Collaborative perception has recently gained significant attention in autonomous driving, improving perception quality by enabling the exchange of additional information among vehicles. However, deploying collaborative perception systems can lead to domain shifts due to diverse environmental conditions and data heterogeneity among connected and autonomous vehicles (CAVs). To address these challenges, we propose a unified domain generalization framework to be utilized during the training and inference stages of collaborative perception. In the training phase, we introduce an Amplitude Augmentation (AmpAug) method to augment low-frequency image variations, broadening the model's ability to learn across multiple domains. We also employ a meta-consistency training scheme to simulate domain shifts, optimizing the model with a carefully designed consistency loss to acquire domain-invariant representations. In the inference phase, we introduce an intra-system domain alignment mechanism to reduce or potentially eliminate the domain discrepancy among CAVs prior to inference. Extensive experiments substantiate the effectiveness of our method in comparison with the existing state-of-the-art works.
Paper Structure (16 sections, 17 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 17 equations, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: The problem setting of domain generalization for collaborative perception, which aims to tackle the domain generalization problem in collaborative perception while aligning the domain gap among different CAVs.
  • Figure 2: Overall Architecture. 1) The first part of our proposed method is amplitude augmentation which can transform the original source domain data to different target domains. 2) The second part is meta-consistency training, which can simulate the domain shift, guiding the model to learn how to learn from different domains. Then, we exploit the meta-consistency loss to encourage our model to learn the domain-invariant features, thereby enhancing the ability of generalization of the model. 3) The third part is the intra-system domain alignment, which can minimize the domain gap among the data perceived by different collaborative vehicles prior to inference.
  • Figure 3: Visualization of our constructed target dataset $\mathcal{D}_t$ and image fast Fourier transform (FFT).Subfigure (a) gives a brief illustration of the dataset, where we observe that this dataset contains images with different weather conditions, driving scenarios, colors, etc.; Subfigure (b) visualizes FFT of images, where we transform an input image $x$ to the frequency domain by FFT and obtain the amplitude spectrum $A(x)$ (low-frequency spectrum) and phase spectrum $P(x)$ (high-frequency spectrum). The amplitude spectrum indicates the magnitude of each frequency component present in the image, which is crucial for understanding the image's texture and style contents. The phase spectrum of an image specifies the phase or phase shifts of different spatial frequency components present in the image, providing detailed information about the spatial arrangement and positioning of features within the image.
  • Figure 4: Qualitative comparison of image translation in the RGB and LAB color spaces.
  • Figure 5: Visualization of the foggy dataset. The first row is the original image, and the second row is the synthesized foggy image.
  • ...and 7 more figures