DiffCP: Ultra-Low Bit Collaborative Perception via Diffusion Model
Ruiqing Mao, Haotian Wu, Yukuan Jia, Zhaojun Nan, Yuxuan Sun, Sheng Zhou, Deniz Gündüz, Zhisheng Niu
TL;DR
DiffCP introduces a diffusion-model–based framework for ultra-low-bandwidth collaborative perception by reconstructing the co-agent BEV features at the ego agent using geometric and semantic conditioning. It transmits only a compact semantic vector and leverages a latent BEV diffusion model to recover the co-agent BEV distribution, enabling feature-level collaboration within object-level data rates. On OPV2V, DiffCP achieves about a 14.5x reduction in data-rate while maintaining state-of-the-art perception performance and demonstrates a flexible Top-K augmentation for high-precision tasks. The approach acts as a plug-in to existing BEV-based CP pipelines, facilitating deployment of connected intelligent systems under congested wireless channels.
Abstract
Collaborative perception (CP) is emerging as a promising solution to the inherent limitations of stand-alone intelligence. However, current wireless communication systems are unable to support feature-level and raw-level collaborative algorithms due to their enormous bandwidth demands. In this paper, we propose DiffCP, a novel CP paradigm that utilizes a specialized diffusion model to efficiently compress the sensing information of collaborators. By incorporating both geometric and semantic conditions into the generative model, DiffCP enables feature-level collaboration with an ultra-low communication cost, advancing the practical implementation of CP systems. This paradigm can be seamlessly integrated into existing CP algorithms to enhance a wide range of downstream tasks. Through extensive experimentation, we investigate the trade-offs between communication, computation, and performance. Numerical results demonstrate that DiffCP can significantly reduce communication costs by 14.5-fold while maintaining the same performance as the state-of-the-art algorithm.
