CoopDETR: A Unified Cooperative Perception Framework for 3D Detection via Object Query

Zhe Wang; Shaocong Xu; Xucai Zhuang; Tongda Xu; Yan Wang; Jingjing Liu; Yilun Chen; Ya-Qin Zhang

CoopDETR: A Unified Cooperative Perception Framework for 3D Detection via Object Query

Zhe Wang, Shaocong Xu, Xucai Zhuang, Tongda Xu, Yan Wang, Jingjing Liu, Yilun Chen, Ya-Qin Zhang

TL;DR

CoopDETR tackles the bandwidth bottleneck in multi-agent cooperative perception by shifting from region- or raw-data fusion to object-level feature cooperation through object queries. It encodes each agent's observations into a set of $N_q$ object queries via a PointDETR-based single-agent module and then performs cross-agent fusion using Spatial Query Matching and Object Query Aggregation to fuse queries within object-specific graphs. Empirically, CoopDETR achieves state-of-the-art AP on OPV2V and V2XSet while reducing transmission volume to approximately 1/782 of prior intermediate-fusion methods, and demonstrates robustness to pose errors in communication. This approach highlights the practicality and effectiveness of object-centric collaboration for scalable, high-performance cooperative perception in autonomous systems.

Abstract

Cooperative perception enhances the individual perception capabilities of autonomous vehicles (AVs) by providing a comprehensive view of the environment. However, balancing perception performance and transmission costs remains a significant challenge. Current approaches that transmit region-level features across agents are limited in interpretability and demand substantial bandwidth, making them unsuitable for practical applications. In this work, we propose CoopDETR, a novel cooperative perception framework that introduces object-level feature cooperation via object query. Our framework consists of two key modules: single-agent query generation, which efficiently encodes raw sensor data into object queries, reducing transmission cost while preserving essential information for detection; and cross-agent query fusion, which includes Spatial Query Matching (SQM) and Object Query Aggregation (OQA) to enable effective interaction between queries. Our experiments on the OPV2V and V2XSet datasets demonstrate that CoopDETR achieves state-of-the-art performance and significantly reduces transmission costs to 1/782 of previous methods.

CoopDETR: A Unified Cooperative Perception Framework for 3D Detection via Object Query

TL;DR

Abstract

CoopDETR: A Unified Cooperative Perception Framework for 3D Detection via Object Query

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)