SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles

Deyuan Qu; Qi Chen; Tianyu Bai; Hongsheng Lu; Heng Fan; Hao Zhang; Song Fu; Qing Yang

SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles

Deyuan Qu, Qi Chen, Tianyu Bai, Hongsheng Lu, Heng Fan, Hao Zhang, Song Fu, Qing Yang

TL;DR

Simultaneous Individual and Cooperative Perception (SiCP), a generic framework that supports a wide range of the state-of-the-art standalone perception backbones and enhances them with a novel Dual-Perception Network (DP-Net) designed to facilitate both individual and cooperative perception, surpasses state-of-the-art cooperative perception solutions while preserving the performance of standalone perception solutions.

Abstract

Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles. However, the absence of feature maps shared from other vehicles can lead to a significant decline in 3D object detection performance for cooperative perception models compared to standalone 3D detection models. This drawback impedes the adoption of cooperative perception as vehicle resources are often insufficient to concurrently employ two perception models. To tackle this issue, we present Simultaneous Individual and Cooperative Perception (SiCP), a generic framework that supports a wide range of the state-of-the-art standalone perception backbones and enhances them with a novel Dual-Perception Network (DP-Net) designed to facilitate both individual and cooperative perception. In addition to its lightweight nature with only 0.13M parameters, DP-Net is robust and retains crucial gradient information during feature map fusion. As demonstrated in a comprehensive evaluation on the V2V4Real and OPV2V datasets, thanks to DP-Net, SiCP surpasses state-of-the-art cooperative perception solutions while preserving the performance of standalone perception solutions.

SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles

TL;DR

Abstract

Paper Structure (19 sections, 4 equations, 7 figures, 4 tables)

This paper contains 19 sections, 4 equations, 7 figures, 4 tables.

Introduction
Proposed Solution
Main Contributions
Related Works
Individual Perception
Cooperative Perception
Methodology
Dual-Perception Network (DP-Net)
Receiver-Agnostic Feature Sharing
Complementary Feature Fusion
Detection Head
Experiments
Quantitative Evaluations
DP-Net is a Robust Module on Alignment Error
DP-Net is a Lightweight Plug-and-Play Module
...and 4 more sections

Figures (7)

Figure 1: Different approaches to 3D perception. In (a), individual perception uses local sensor data for object detection. Cooperative perception, shown in (b), combines data from various vehicles to enhance the ego-vehicle's perception. Simultaneous Individual and Cooperative Perception (SiCP), as depicted in (c), supports both functionalities simultaneously.
Figure 2: An overview of the SiCP architecture showcases its components: a feature extractor, a feature processor (DP-Net), and a detection head. All vehicles have identical feature extractors producing fusible features. The feature processor manages local features $F_{ego}$ for individual perception and fused features $F^*_{ego}$ for cooperative perception. Features from other vehicles (e.g., $F_j$) are transformed to the ego vehicle's perspective and then performs a complementary fusion with the local features of the ego vehicle. The resulting feature $F^*_{ego}$ is then processed by the detection head to generate classification and regression results for either individual or cooperative perceptions.
Figure 3: Complementary Feature Fusion efficiently merges two BEV (Bird's Eye View) feature maps by learning a weighted map. Initially, it concatenates the two feature maps and condenses them into a one-channel feature map, using a 1x1 convolutional operation. This resultant feature map undergoes processing through two convolutional layers, generating the weighted map $M$. $M$ adjusts the ego vehicle’s local feature map, whereas the complementary weighted map $(1-M)$ modifies the received feature map. Finally, the two feature maps are concatenated and reshaped to the size of $H \times W \times C$.
Figure 4: Gradient information is lost during the fusion of feature maps. In (a), a receiver's feature map clearly indicates six objects but misinterprets one object (red rectangle, as shown in (b)) in the sender's feature map. Upon fusion of these maps using the maxout function, the gradients of this particular object (red dotted rectangle) vanish, as shown in (c). Our method can effectively preserve the gradient during the feature fusion process, as shown in (d).
Figure 5: Robust response to asynchronous mode and localization error. SiCP outperforms other SOTAs across both datasets with IoU=0.7.
...and 2 more figures

SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles

TL;DR

Abstract

SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles

Authors

TL;DR

Abstract

Table of Contents

Figures (7)