Table of Contents
Fetching ...

DiffCP: Ultra-Low Bit Collaborative Perception via Diffusion Model

Ruiqing Mao, Haotian Wu, Yukuan Jia, Zhaojun Nan, Yuxuan Sun, Sheng Zhou, Deniz Gündüz, Zhisheng Niu

TL;DR

DiffCP introduces a diffusion-model–based framework for ultra-low-bandwidth collaborative perception by reconstructing the co-agent BEV features at the ego agent using geometric and semantic conditioning. It transmits only a compact semantic vector and leverages a latent BEV diffusion model to recover the co-agent BEV distribution, enabling feature-level collaboration within object-level data rates. On OPV2V, DiffCP achieves about a 14.5x reduction in data-rate while maintaining state-of-the-art perception performance and demonstrates a flexible Top-K augmentation for high-precision tasks. The approach acts as a plug-in to existing BEV-based CP pipelines, facilitating deployment of connected intelligent systems under congested wireless channels.

Abstract

Collaborative perception (CP) is emerging as a promising solution to the inherent limitations of stand-alone intelligence. However, current wireless communication systems are unable to support feature-level and raw-level collaborative algorithms due to their enormous bandwidth demands. In this paper, we propose DiffCP, a novel CP paradigm that utilizes a specialized diffusion model to efficiently compress the sensing information of collaborators. By incorporating both geometric and semantic conditions into the generative model, DiffCP enables feature-level collaboration with an ultra-low communication cost, advancing the practical implementation of CP systems. This paradigm can be seamlessly integrated into existing CP algorithms to enhance a wide range of downstream tasks. Through extensive experimentation, we investigate the trade-offs between communication, computation, and performance. Numerical results demonstrate that DiffCP can significantly reduce communication costs by 14.5-fold while maintaining the same performance as the state-of-the-art algorithm.

DiffCP: Ultra-Low Bit Collaborative Perception via Diffusion Model

TL;DR

DiffCP introduces a diffusion-model–based framework for ultra-low-bandwidth collaborative perception by reconstructing the co-agent BEV features at the ego agent using geometric and semantic conditioning. It transmits only a compact semantic vector and leverages a latent BEV diffusion model to recover the co-agent BEV distribution, enabling feature-level collaboration within object-level data rates. On OPV2V, DiffCP achieves about a 14.5x reduction in data-rate while maintaining state-of-the-art perception performance and demonstrates a flexible Top-K augmentation for high-precision tasks. The approach acts as a plug-in to existing BEV-based CP pipelines, facilitating deployment of connected intelligent systems under congested wireless channels.

Abstract

Collaborative perception (CP) is emerging as a promising solution to the inherent limitations of stand-alone intelligence. However, current wireless communication systems are unable to support feature-level and raw-level collaborative algorithms due to their enormous bandwidth demands. In this paper, we propose DiffCP, a novel CP paradigm that utilizes a specialized diffusion model to efficiently compress the sensing information of collaborators. By incorporating both geometric and semantic conditions into the generative model, DiffCP enables feature-level collaboration with an ultra-low communication cost, advancing the practical implementation of CP systems. This paradigm can be seamlessly integrated into existing CP algorithms to enhance a wide range of downstream tasks. Through extensive experimentation, we investigate the trade-offs between communication, computation, and performance. Numerical results demonstrate that DiffCP can significantly reduce communication costs by 14.5-fold while maintaining the same performance as the state-of-the-art algorithm.
Paper Structure (16 sections, 7 equations, 4 figures, 2 tables)

This paper contains 16 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: An illustration of CP in IUSs. Compared with previous feature-level CP, the proposed DiffCP leverages the ego-agent's prior geometric to recover feature information from the co-agent. This enables feature-level CP within object-level data rate requirements, satisfying real-world wireless communication constraints.
  • Figure 2: The overall architecture of DiffCP. The size of the feature maps is provided as an example to demonstrate the compression functionality. Left: During the training process, the model is trained using noised BEV features from the co-agent to learn the denoising process. Right: During the inference process, pure noise features are input to reconstruct the co-agent's BEV features, while only the semantic feature vectors are transmitted through the wireless channel.
  • Figure 3: Visualization of the reconstructed BEV features in different sampling steps ($L=512$).
  • Figure 4: Performance under various data rates.