Table of Contents
Fetching ...

GCP: Guarded Collaborative Perception with Spatial-Temporal Aware Malicious Agent Detection

Yihang Tao, Senkang Hu, Yue Hu, Haonan An, Hangcheng Cao, Yuguang Fang

TL;DR

This paper addresses the vulnerability of collaborative perception (CP) in autonomous driving to adversarial messages by introducing a novel BAC attack that exploits blind regions and temporal patterns. It then proposes GCP, a Guarded CP framework that integrates a confidence-scaled spatial concordance loss, LSTM-AE-based temporal BEV-flow reconstruction, and a joint spatial-temporal Benjamini-Hochberg test to detect malicious agents. Empirical results on V2X-Sim show GCP achieving significant improvements over state-of-the-art defenses, notably up to 34.69% in AP@0.5 under BAC, while maintaining robustness across other attacks and real-time efficiency. The work advances secure CP by leveraging both spatial and temporal cues for reliable, scalable defense in multi-agent driving scenarios.

Abstract

Collaborative perception significantly enhances autonomous driving safety by extending each vehicle's perception range through message sharing among connected and autonomous vehicles. Unfortunately, it is also vulnerable to adversarial message attacks from malicious agents, resulting in severe performance degradation. While existing defenses employ hypothesis-and-verification frameworks to detect malicious agents based on single-shot outliers, they overlook temporal message correlations, which can be circumvented by subtle yet harmful perturbations in model input and output spaces. This paper reveals a novel blind area confusion (BAC) attack that compromises existing single-shot outlier-based detection methods. As a countermeasure, we propose GCP, a Guarded Collaborative Perception framework based on spatial-temporal aware malicious agent detection, which maintains single-shot spatial consistency through a confidence-scaled spatial concordance loss, while simultaneously examining temporal anomalies by reconstructing historical bird's eye view motion flows in low-confidence regions. We also employ a joint spatial-temporal Benjamini-Hochberg test to synthesize dual-domain anomaly results for reliable malicious agent detection. Extensive experiments demonstrate GCP's superior performance under diverse attack scenarios, achieving up to 34.69% improvements in AP@0.5 compared to the state-of-the-art CP defense strategies under BAC attacks, while maintaining consistent 5-8% improvements under other typical attacks. Code will be released at https://github.com/CP-Security/GCP.git.

GCP: Guarded Collaborative Perception with Spatial-Temporal Aware Malicious Agent Detection

TL;DR

This paper addresses the vulnerability of collaborative perception (CP) in autonomous driving to adversarial messages by introducing a novel BAC attack that exploits blind regions and temporal patterns. It then proposes GCP, a Guarded CP framework that integrates a confidence-scaled spatial concordance loss, LSTM-AE-based temporal BEV-flow reconstruction, and a joint spatial-temporal Benjamini-Hochberg test to detect malicious agents. Empirical results on V2X-Sim show GCP achieving significant improvements over state-of-the-art defenses, notably up to 34.69% in AP@0.5 under BAC, while maintaining robustness across other attacks and real-time efficiency. The work advances secure CP by leveraging both spatial and temporal cues for reliable, scalable defense in multi-agent driving scenarios.

Abstract

Collaborative perception significantly enhances autonomous driving safety by extending each vehicle's perception range through message sharing among connected and autonomous vehicles. Unfortunately, it is also vulnerable to adversarial message attacks from malicious agents, resulting in severe performance degradation. While existing defenses employ hypothesis-and-verification frameworks to detect malicious agents based on single-shot outliers, they overlook temporal message correlations, which can be circumvented by subtle yet harmful perturbations in model input and output spaces. This paper reveals a novel blind area confusion (BAC) attack that compromises existing single-shot outlier-based detection methods. As a countermeasure, we propose GCP, a Guarded Collaborative Perception framework based on spatial-temporal aware malicious agent detection, which maintains single-shot spatial consistency through a confidence-scaled spatial concordance loss, while simultaneously examining temporal anomalies by reconstructing historical bird's eye view motion flows in low-confidence regions. We also employ a joint spatial-temporal Benjamini-Hochberg test to synthesize dual-domain anomaly results for reliable malicious agent detection. Extensive experiments demonstrate GCP's superior performance under diverse attack scenarios, achieving up to 34.69% improvements in AP@0.5 compared to the state-of-the-art CP defense strategies under BAC attacks, while maintaining consistent 5-8% improvements under other typical attacks. Code will be released at https://github.com/CP-Security/GCP.git.
Paper Structure (22 sections, 27 equations, 8 figures, 4 tables, 2 algorithms)

This paper contains 22 sections, 27 equations, 8 figures, 4 tables, 2 algorithms.

Figures (8)

  • Figure 1: Illustration of security challenges and defense mechanisms in CP. While CP systems are vulnerable to adversarial messages from malicious agents, our proposed GCP framework provides comprehensive protection through joint spatial-temporal consistency verification, effectively safeguarding the system against various attack patterns.
  • Figure 2: Overview of the proposed blind area confusion (BAC) attack. The malicious agent first establishes communication with the victim ego CAV to obtain collaborative messages, then infers the victim's blind regions through differential detection analysis and region segmentation. Finally, it generates adversarial perturbations guided by the inferred confidence mask to confuse the victim's perception defense system.
  • Figure 3: Overview of the proposed GCP framework.GCP performs joint spatial-temporal consistency verification through two key components: (1) a confidence-scaled spatial concordance loss that adaptively evaluates detection consistency, and (2) an LSTM-AE-based temporal BEV flow reconstruction that captures motion patterns in CP.
  • Figure 4: Architecture of LSTM-AE-based BEV flow reconstruction. The input BEV flow vector consists of 8-dimensional features representing corner points of detected objects. The encoded latent features are repeated $K + 1$ times before decoding, followed by a TimeDistributed layer for temporal-aware reconstruction of object motion patterns.
  • Figure 5: Comparative results of AP under different cached frame length and consecutive KF interpolation times on V2X-Sim Dataset. Attack settings: $m = 2$, $\lambda = 0.25$; $\Delta_i = \Delta_o = 0.5$. (a)-(d): AP@0.7 results under different attacks and interpolation budgets; (e)-(h): AP@0.7 results under different attacks and cached frame length.
  • ...and 3 more figures