Table of Contents
Fetching ...

SRA-CP: Spontaneous Risk-Aware Selective Cooperative Perception

Jiaxi Liu, Chengyuan Ma, Hang Zhou, Weizhe Tang, Shixiao Liang, Haoyang Ding, Xiaopeng Li, Bin Ran

TL;DR

SRA-CP tackles the dual challenge of large perception data volumes and dynamic, ad-hoc vehicle encounters in cooperative perception. It introduces a spontaneous, risk-aware selective CP framework with routine broadcasts of perception coverage and on-demand, risk-driven handshakes, constrained by a per-link byte budget. The methodology combines a perceptual risk identification model, selective information sharing with spatial and risk masks, and a dual-attention fusion decoder to preserve safety-critical perception under bandwidth limits. Empirical results on the OPV2V dataset show less than 1% loss in safety-critical object AP versus generic CP while using only 20% of the communication bandwidth, and a 15% improvement over risk-agnostic selective CP baselines for critical objects. These findings illustrate the approach’s potential for scalable, real-world CP in highly dynamic traffic environments, enabling safer autonomous driving with limited communication resources.

Abstract

Cooperative perception (CP) offers significant potential to overcome the limitations of single-vehicle sensing by enabling information sharing among connected vehicles (CVs). However, existing generic CP approaches need to transmit large volumes of perception data that are irrelevant to the driving safety, exceeding available communication bandwidth. Moreover, most CP frameworks rely on pre-defined communication partners, making them unsuitable for dynamic traffic environments. This paper proposes a Spontaneous Risk-Aware Selective Cooperative Perception (SRA-CP) framework to address these challenges. SRA-CP introduces a decentralized protocol where connected agents continuously broadcast lightweight perception coverage summaries and initiate targeted cooperation only when risk-relevant blind zones are detected. A perceptual risk identification module enables each CV to locally assess the impact of occlusions on its driving task and determine whether cooperation is necessary. When CP is triggered, the ego vehicle selects appropriate peers based on shared perception coverage and engages in selective information exchange through a fusion module that prioritizes safety-critical content and adapts to bandwidth constraints. We evaluate SRA-CP on a public dataset against several representative baselines. Results show that SRA-CP achieves less than 1% average precision (AP) loss for safety-critical objects compared to generic CP, while using only 20% of the communication bandwidth. Moreover, it improves the perception performance by 15% over existing selective CP methods that do not incorporate risk awareness.

SRA-CP: Spontaneous Risk-Aware Selective Cooperative Perception

TL;DR

SRA-CP tackles the dual challenge of large perception data volumes and dynamic, ad-hoc vehicle encounters in cooperative perception. It introduces a spontaneous, risk-aware selective CP framework with routine broadcasts of perception coverage and on-demand, risk-driven handshakes, constrained by a per-link byte budget. The methodology combines a perceptual risk identification model, selective information sharing with spatial and risk masks, and a dual-attention fusion decoder to preserve safety-critical perception under bandwidth limits. Empirical results on the OPV2V dataset show less than 1% loss in safety-critical object AP versus generic CP while using only 20% of the communication bandwidth, and a 15% improvement over risk-agnostic selective CP baselines for critical objects. These findings illustrate the approach’s potential for scalable, real-world CP in highly dynamic traffic environments, enabling safer autonomous driving with limited communication resources.

Abstract

Cooperative perception (CP) offers significant potential to overcome the limitations of single-vehicle sensing by enabling information sharing among connected vehicles (CVs). However, existing generic CP approaches need to transmit large volumes of perception data that are irrelevant to the driving safety, exceeding available communication bandwidth. Moreover, most CP frameworks rely on pre-defined communication partners, making them unsuitable for dynamic traffic environments. This paper proposes a Spontaneous Risk-Aware Selective Cooperative Perception (SRA-CP) framework to address these challenges. SRA-CP introduces a decentralized protocol where connected agents continuously broadcast lightweight perception coverage summaries and initiate targeted cooperation only when risk-relevant blind zones are detected. A perceptual risk identification module enables each CV to locally assess the impact of occlusions on its driving task and determine whether cooperation is necessary. When CP is triggered, the ego vehicle selects appropriate peers based on shared perception coverage and engages in selective information exchange through a fusion module that prioritizes safety-critical content and adapts to bandwidth constraints. We evaluate SRA-CP on a public dataset against several representative baselines. Results show that SRA-CP achieves less than 1% average precision (AP) loss for safety-critical objects compared to generic CP, while using only 20% of the communication bandwidth. Moreover, it improves the perception performance by 15% over existing selective CP methods that do not incorporate risk awareness.

Paper Structure

This paper contains 37 sections, 16 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Comparison between (a)Generic CP with full-time information exchange VS the proposed Risk-aware selective CP activated by risky blind-spot events; and (b)Pre-arranged CP constrained by predefined communication partners VS Spontaneous CP enabling dynamic ad-hoc cooperation in arbitrary encounter situations.
  • Figure 2: Spontaneous Risk-Aware Selective Cooperative Perception (SRA-CP)
  • Figure 3: End‑to‑end architecture of Selective Information Sharing and Fusion. Each co‑operative vehicle $i \in \{e, k, j\}$ projects its raw point cloud $\boldsymbol{\Phi}\!\left(\mathcal{P}_{i}\right)$ to the ego Bird’s‑Eye‑View (BEV) frame and encodes it through a shared feature‑encoder, yielding $F_{i}$. Risk‑aware communication (Sec. \ref{['sec:risk_comm']}) attaches two light‑weight masks—the spatial mask $S_{i}$ and the risk mask $R_{i}$—to the feature map and broadcasts only these three tensors, avoiding transmission of raw point clouds. The ego car receives the partner streams and performs dual‑attention feature fusion (Sec. \ref{['sec:dual_attention']}): a safety‑focused selector prunes partner features with $(S_{i},R_{i})$ and a location‑wise multi‑head attention block aligns the surviving cells with the ego map $F_{e}$, producing $\tilde{F}_{e}$. Finally, two heads operate on $\tilde{F}_{e}$: (i) a Risk Decoder refines a dense risk heat‑map, and (ii) a Detection Decoder outputs 3‑D bounding boxes.
  • Figure 4: Risk‑aware communication pipeline executed on each partner vehicle $j$. The shared feature map $F_{j}$ is processed by two light‑weight heads: (i) Spatial‑confidence map generator produces a spatial confidence map $C_{s,j}$ that highlights semantically important cells; an adaptive sampling module is used to select a sparse binary spatial mask based on scenario $S_{j}$ for transmission. (ii) Risk‑confidence map generator uses $F_{j}$ together with the ego planned trajectory $P_{e}$ and speed $v_{e}$ to compute a risk map $C_{r,j}$. Adaptive sampling converts it into a binary risk mask $R_{j}$. Both masks $\bigl(S_{j}, R_{j}\bigr)$ are sent to the ego vehicle, while a miniature Risk Decoder can optionally convert $C_{r,j}$ into a dense risk heat‑map for supervision training.
  • Figure 5: Dual‑attention feature fusion. Remote feature tensors $F_{k}$ and $F_{j}$ are first filtered by a Safety‑focused Feature Selection block that combines each partner’s spatial mask $S_{i}$ and risk mask $R_{i}$, yielding sparsified maps $\tilde{F}_{k}$ and $\tilde{F}_{j}$. The ego map $F_{e}$ and the sparsified partner maps are then fused by a location‑wise multi‑head attention module that performs per‑cell key–query interactions, producing an enriched representation $\tilde{F}_{e}$. This two‑stage design discards bandwidth‑hungry, low‑value regions before attention, so both communication and computation focus on areas that are simultaneously safety‑critical and semantically informative. During this process, only three low‑bandwidth tensors $(\tilde{F}_{j}, S_{j}, R_{j})$ leave the vehicle, preserving privacy and saving channel capacity.
  • ...and 5 more figures