JigsawComm: Joint Semantic Feature Encoding and Transmission for Communication-Efficient Cooperative Perception
Chenyi Wang, Zhaowei Li, Ming F. Li, Wujie Wen
TL;DR
JigsawComm tackles the bandwidth bottleneck in cooperative perception by jointly learning a semantic sparse feature encoder and a differentiable transmission scheduler guided by a learned utility proxy. The method produces meta utility maps, enforces a provable top-1 per-cell transmission policy, and achieves an $O(1)$ communication cost as the number of agents grows, while delivering competitive or superior perception accuracy. It demonstrates up to $>500\times$ data reduction on OPV2V and $116\times$ reduction on DAIR-V2X with real-time latency, outperforming or matching state-of-the-art baselines. These results highlight the importance of reducing cross-agent redundancy and propose a practical, scalable approach for goal-oriented CP in real-world V2X networks.
Abstract
Multi-agent cooperative perception (CP) promises to overcome the inherent occlusion and sensing-range limitations of single-agent systems (e.g., autonomous driving). However, its practicality is severely constrained by the limited communication bandwidth. Existing approaches attempt to improve bandwidth efficiency via compression or heuristic message selection, without considering the semantic relevance or cross-agent redundancy of sensory data. We argue that a practical CP system must maximize the contribution of every transmitted bit to the final perception task, by extracting and transmitting semantically essential and non-redundant data. In this paper, we formulate a joint semantic feature encoding and transmission problem, which aims to maximize CP accuracy under limited bandwidth. To solve this problem, we introduce JigsawComm, an end-to-end trained, semantic-aware, and communication-efficient CP framework that learns to ``assemble the puzzle'' of multi-agent feature transmission. It uses a regularized encoder to extract semantically-relevant and sparse features, and a lightweight Feature Utility Estimator to predict the contribution of each agent's features to the final perception task. The resulting meta utility maps are exchanged among agents and leveraged to compute a provably optimal transmission policy, which selects features from agents with the highest utility score for each location. This policy inherently eliminates redundancy and achieves a scalable $\mathcal{O}(1)$ communication cost as the number of agents increases. On the benchmarks OPV2V and DAIR-V2X, JigsawComm reduces the total data volume by up to $>$500$\times$ while achieving matching or superior accuracy compared to state-of-the-art methods.
