Table of Contents
Fetching ...

JigsawComm: Joint Semantic Feature Encoding and Transmission for Communication-Efficient Cooperative Perception

Chenyi Wang, Zhaowei Li, Ming F. Li, Wujie Wen

TL;DR

JigsawComm tackles the bandwidth bottleneck in cooperative perception by jointly learning a semantic sparse feature encoder and a differentiable transmission scheduler guided by a learned utility proxy. The method produces meta utility maps, enforces a provable top-1 per-cell transmission policy, and achieves an $O(1)$ communication cost as the number of agents grows, while delivering competitive or superior perception accuracy. It demonstrates up to $>500\times$ data reduction on OPV2V and $116\times$ reduction on DAIR-V2X with real-time latency, outperforming or matching state-of-the-art baselines. These results highlight the importance of reducing cross-agent redundancy and propose a practical, scalable approach for goal-oriented CP in real-world V2X networks.

Abstract

Multi-agent cooperative perception (CP) promises to overcome the inherent occlusion and sensing-range limitations of single-agent systems (e.g., autonomous driving). However, its practicality is severely constrained by the limited communication bandwidth. Existing approaches attempt to improve bandwidth efficiency via compression or heuristic message selection, without considering the semantic relevance or cross-agent redundancy of sensory data. We argue that a practical CP system must maximize the contribution of every transmitted bit to the final perception task, by extracting and transmitting semantically essential and non-redundant data. In this paper, we formulate a joint semantic feature encoding and transmission problem, which aims to maximize CP accuracy under limited bandwidth. To solve this problem, we introduce JigsawComm, an end-to-end trained, semantic-aware, and communication-efficient CP framework that learns to ``assemble the puzzle'' of multi-agent feature transmission. It uses a regularized encoder to extract semantically-relevant and sparse features, and a lightweight Feature Utility Estimator to predict the contribution of each agent's features to the final perception task. The resulting meta utility maps are exchanged among agents and leveraged to compute a provably optimal transmission policy, which selects features from agents with the highest utility score for each location. This policy inherently eliminates redundancy and achieves a scalable $\mathcal{O}(1)$ communication cost as the number of agents increases. On the benchmarks OPV2V and DAIR-V2X, JigsawComm reduces the total data volume by up to $>$500$\times$ while achieving matching or superior accuracy compared to state-of-the-art methods.

JigsawComm: Joint Semantic Feature Encoding and Transmission for Communication-Efficient Cooperative Perception

TL;DR

JigsawComm tackles the bandwidth bottleneck in cooperative perception by jointly learning a semantic sparse feature encoder and a differentiable transmission scheduler guided by a learned utility proxy. The method produces meta utility maps, enforces a provable top-1 per-cell transmission policy, and achieves an communication cost as the number of agents grows, while delivering competitive or superior perception accuracy. It demonstrates up to data reduction on OPV2V and reduction on DAIR-V2X with real-time latency, outperforming or matching state-of-the-art baselines. These results highlight the importance of reducing cross-agent redundancy and propose a practical, scalable approach for goal-oriented CP in real-world V2X networks.

Abstract

Multi-agent cooperative perception (CP) promises to overcome the inherent occlusion and sensing-range limitations of single-agent systems (e.g., autonomous driving). However, its practicality is severely constrained by the limited communication bandwidth. Existing approaches attempt to improve bandwidth efficiency via compression or heuristic message selection, without considering the semantic relevance or cross-agent redundancy of sensory data. We argue that a practical CP system must maximize the contribution of every transmitted bit to the final perception task, by extracting and transmitting semantically essential and non-redundant data. In this paper, we formulate a joint semantic feature encoding and transmission problem, which aims to maximize CP accuracy under limited bandwidth. To solve this problem, we introduce JigsawComm, an end-to-end trained, semantic-aware, and communication-efficient CP framework that learns to ``assemble the puzzle'' of multi-agent feature transmission. It uses a regularized encoder to extract semantically-relevant and sparse features, and a lightweight Feature Utility Estimator to predict the contribution of each agent's features to the final perception task. The resulting meta utility maps are exchanged among agents and leveraged to compute a provably optimal transmission policy, which selects features from agents with the highest utility score for each location. This policy inherently eliminates redundancy and achieves a scalable communication cost as the number of agents increases. On the benchmarks OPV2V and DAIR-V2X, JigsawComm reduces the total data volume by up to 500 while achieving matching or superior accuracy compared to state-of-the-art methods.

Paper Structure

This paper contains 17 sections, 3 theorems, 11 equations, 8 figures, 3 tables.

Key Result

Proposition 1

For any $(i,l)$ with $u_i^l\neq \tau$ and with no injected noise at inference, we have Therefore, we have

Figures (8)

  • Figure 1: A sample scenario from OPV2V dataset (left). Under the same bandwidth limit, prior work hu2022where2comm (mid) does not utilize bandwidth efficiently, leading to missed detection and reduced accuracy. JigsawComm (right) maximizes every bit's contribution, enabling complete and accurate detection. Quantitative evaluation and discussion on the 'cost' of redundancy can be found in Sec. \ref{['sec:eval']} and Sec. \ref{['sec:discussion']}, respectively.
  • Figure 2: Overview of JigsawComm.
  • Figure 3: Visualization of meta utility map examples. The top-1 aggregated utility map eliminates redundancy by selecting the agent with the highest utility at each location.
  • Figure 4: Visualization of feature map examples.
  • Figure 5: Tradeoff between accuracy and required bandwidth.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Proposition 1: Consistency of differentiable and deterministic schedulers
  • Theorem 1: Singleton optimality
  • proof
  • Theorem 2: Greedy optimality under equal costs
  • proof