Table of Contents
Fetching ...

R-ACP: Real-Time Adaptive Collaborative Perception Leveraging Robust Task-Oriented Communications

Zhengru Fang, Jingjing Wang, Yanan Ma, Yihang Tao, Yiqin Deng, Xianhao Chen, Yuguang Fang

TL;DR

R-ACP presents a real-time framework for robust, bandwidth-aware collaborative perception across multiple cameras by jointly addressing calibration drift and data timeliness. It introduces AoPT to fuse freshness with target relevance, a Re-ID–based channel-aware self-calibration mechanism with adaptive feature quantization, and an IB-based encoding scheme to balance transmission cost with inference quality. A priority-based streaming scheduler and robust multi-view fusion further mitigate packet loss, delivering up to ~25% gains in MODA and over 50% reductions in communication cost in challenging conditions. The approach enables scalable, real-time multi-UGV perception with practical edge-computing deployments.

Abstract

Collaborative perception enhances sensing in multirobot and vehicular networks by fusing information from multiple agents, improving perception accuracy and sensing range. However, mobility and non-rigid sensor mounts introduce extrinsic calibration errors, necessitating online calibration, further complicated by limited overlap in sensing regions. Moreover, maintaining fresh information is crucial for timely and accurate sensing. To address calibration errors and ensure timely and accurate perception, we propose a robust task-oriented communication strategy to optimize online self-calibration and efficient feature sharing for Real-time Adaptive Collaborative Perception (R-ACP). Specifically, we first formulate an Age of Perceived Targets (AoPT) minimization problem to capture data timeliness of multi-view streaming. Then, in the calibration phase, we introduce a channel-aware self-calibration technique based on reidentification (Re-ID), which adaptively compresses key features according to channel capacities, effectively addressing calibration issues via spatial and temporal cross-camera correlations. In the streaming phase, we tackle the trade-off between bandwidth and inference accuracy by leveraging an Information Bottleneck (IB) based encoding method to adjust video compression rates based on task relevance, thereby reducing communication overhead and latency. Finally, we design a priority-aware network to filter corrupted features to mitigate performance degradation from packet corruption. Extensive studies demonstrate that our framework outperforms five baselines, improving multiple object detection accuracy (MODA) by 25.49% and reducing communication costs by 51.36% under severely poor channel conditions. Code will be made publicly available: github.com/fangzr/R-ACP.

R-ACP: Real-Time Adaptive Collaborative Perception Leveraging Robust Task-Oriented Communications

TL;DR

R-ACP presents a real-time framework for robust, bandwidth-aware collaborative perception across multiple cameras by jointly addressing calibration drift and data timeliness. It introduces AoPT to fuse freshness with target relevance, a Re-ID–based channel-aware self-calibration mechanism with adaptive feature quantization, and an IB-based encoding scheme to balance transmission cost with inference quality. A priority-based streaming scheduler and robust multi-view fusion further mitigate packet loss, delivering up to ~25% gains in MODA and over 50% reductions in communication cost in challenging conditions. The approach enables scalable, real-time multi-UGV perception with practical edge-computing deployments.

Abstract

Collaborative perception enhances sensing in multirobot and vehicular networks by fusing information from multiple agents, improving perception accuracy and sensing range. However, mobility and non-rigid sensor mounts introduce extrinsic calibration errors, necessitating online calibration, further complicated by limited overlap in sensing regions. Moreover, maintaining fresh information is crucial for timely and accurate sensing. To address calibration errors and ensure timely and accurate perception, we propose a robust task-oriented communication strategy to optimize online self-calibration and efficient feature sharing for Real-time Adaptive Collaborative Perception (R-ACP). Specifically, we first formulate an Age of Perceived Targets (AoPT) minimization problem to capture data timeliness of multi-view streaming. Then, in the calibration phase, we introduce a channel-aware self-calibration technique based on reidentification (Re-ID), which adaptively compresses key features according to channel capacities, effectively addressing calibration issues via spatial and temporal cross-camera correlations. In the streaming phase, we tackle the trade-off between bandwidth and inference accuracy by leveraging an Information Bottleneck (IB) based encoding method to adjust video compression rates based on task relevance, thereby reducing communication overhead and latency. Finally, we design a priority-aware network to filter corrupted features to mitigate performance degradation from packet corruption. Extensive studies demonstrate that our framework outperforms five baselines, improving multiple object detection accuracy (MODA) by 25.49% and reducing communication costs by 51.36% under severely poor channel conditions. Code will be made publicly available: github.com/fangzr/R-ACP.
Paper Structure (27 sections, 3 theorems, 36 equations, 17 figures, 3 tables)

This paper contains 27 sections, 3 theorems, 36 equations, 17 figures, 3 tables.

Key Result

Proposition 1

The AoI for UGV $k$ under deterministic sampling and transmission delays is given by: where $d_k^T = \frac{D}{C_k} = D \left[ B_k \log_2\left( 1 + \frac{P_t G_k}{N_0 B_k} \right) \right]^{-1}$,$B_k$ represents the bandwidth allocated to the link between UGV $k$ and the edge server, $P_t$ is the transmission power, $G_k$ is the channel gain for UGV $k$, $N_0$ is the noise power spectr

Figures (17)

  • Figure 1: Effect of unpredictable accidents involving UGVs on camera extrinsic parameters and perception error rates.
  • Figure 2: The system consists of several UGVs equipped with cameras, collaboratively tracking pedestrians.
  • Figure 3: The flow of the self-calibration method using multiview feature sharing.
  • Figure 4: Illustrations of different age-based functions.
  • Figure 5: Framework of R-ACP.
  • ...and 12 more figures

Theorems & Definitions (7)

  • Proposition 1
  • Proof 1
  • Definition 1
  • Proposition 2
  • Proof 1
  • Proposition 3
  • Proof 2