Table of Contents
Fetching ...

QUEST: Query Stream for Practical Cooperative Perception

Siqi Fan, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie

TL;DR

The paper addresses occlusion and range limitations in autonomous driving by proposing query cooperation, a middle-ground paradigm between scene-level feature cooperation and instance-level result fusion. It introduces QUEST, a cross-agent query stream where transformer's object queries flow between agents and interact via fusion for co-aware objects and complementation for unseen ones, using a dual-space embedding and attentive fusion to align and merge queries. On DAIR-V2X-Seq, QUEST delivers substantial improvements over vehicle-only and traditional cooperation approaches, with $AP_{BEV|0.5}=20.3\%$ and $AP_{3D|0.5}=14.1\%$, and demonstrates transmission flexibility and robustness to packet dropout. The work highlights practical benefits for cross-agent perception, provides camera-centric cooperation labels, and outlines extensions toward temporal cooperation and end-to-end cooperative driving while noting deployment challenges such as the need for query-based onboard systems and cross-architecture alignment. These findings suggest that query-based interaction can offer a scalable, interpretable pathway to robust cooperative perception in real-world settings.

Abstract

Cooperative perception can effectively enhance individual perception performance by providing additional viewpoint and expanding the sensing field. Existing cooperation paradigms are either interpretable (result cooperation) or flexible (feature cooperation). In this paper, we propose the concept of query cooperation to enable interpretable instance-level flexible feature interaction. To specifically explain the concept, we propose a cooperative perception framework, termed QUEST, which let query stream flow among agents. The cross-agent queries are interacted via fusion for co-aware instances and complementation for individual unaware instances. Taking camera-based vehicle-infrastructure perception as a typical practical application scene, the experimental results on the real-world dataset, DAIR-V2X-Seq, demonstrate the effectiveness of QUEST and further reveal the advantage of the query cooperation paradigm on transmission flexibility and robustness to packet dropout. We hope our work can further facilitate the cross-agent representation interaction for better cooperative perception in practice.

QUEST: Query Stream for Practical Cooperative Perception

TL;DR

The paper addresses occlusion and range limitations in autonomous driving by proposing query cooperation, a middle-ground paradigm between scene-level feature cooperation and instance-level result fusion. It introduces QUEST, a cross-agent query stream where transformer's object queries flow between agents and interact via fusion for co-aware objects and complementation for unseen ones, using a dual-space embedding and attentive fusion to align and merge queries. On DAIR-V2X-Seq, QUEST delivers substantial improvements over vehicle-only and traditional cooperation approaches, with and , and demonstrates transmission flexibility and robustness to packet dropout. The work highlights practical benefits for cross-agent perception, provides camera-centric cooperation labels, and outlines extensions toward temporal cooperation and end-to-end cooperative driving while noting deployment challenges such as the need for query-based onboard systems and cross-architecture alignment. These findings suggest that query-based interaction can offer a scalable, interpretable pathway to robust cooperative perception in real-world settings.

Abstract

Cooperative perception can effectively enhance individual perception performance by providing additional viewpoint and expanding the sensing field. Existing cooperation paradigms are either interpretable (result cooperation) or flexible (feature cooperation). In this paper, we propose the concept of query cooperation to enable interpretable instance-level flexible feature interaction. To specifically explain the concept, we propose a cooperative perception framework, termed QUEST, which let query stream flow among agents. The cross-agent queries are interacted via fusion for co-aware instances and complementation for individual unaware instances. Taking camera-based vehicle-infrastructure perception as a typical practical application scene, the experimental results on the real-world dataset, DAIR-V2X-Seq, demonstrate the effectiveness of QUEST and further reveal the advantage of the query cooperation paradigm on transmission flexibility and robustness to packet dropout. We hope our work can further facilitate the cross-agent representation interaction for better cooperative perception in practice.
Paper Structure (15 sections, 7 equations, 7 figures, 3 tables)

This paper contains 15 sections, 7 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Query cooperation enables instance-level feature cooperation, which is more interpretable than scene-level feature cooperation and more flexible than instance-level result cooperation.
  • Figure 2: Architecture of QUEST framework.
  • Figure 3: Illustration of the location grid for dual-space query embedding. Compared with the exact center-based matching, grid-based matching is more robust with location noise.
  • Figure 4: Illustration of the cross-agent query complementation. The local queries with low confidence scores are replaced with the received queries to reduce additional computational costs.
  • Figure 5: Visualization examples at different scenes. Red: groundtruth. Blue: predictions of QUEST.
  • ...and 2 more figures