Table of Contents
Fetching ...

WaveComm: Lightweight Communication for Collaborative Perception via Wavelet Feature Distillation

Erdemt Bao, Jin Yang

Abstract

In multi-agent collaborative sensing systems, substantial communication overhead from information exchange significantly limits scalability and real-time performance, especially in bandwidth-constrained environments. This often results in degraded performance and reduced reliability. To address this challenge, we propose WaveComm, a wavelet-based communication framework that drastically reduces transmission loads while preserving sensing performance in low-bandwidth scenarios. The core innovation of WaveComm lies in decomposing feature maps using Discrete Wavelet Transform (DWT), transmitting only compact low-frequency components to minimize communication overhead. High-frequency details are omitted, and their effects are reconstructed at the receiver side using a lightweight generator. A Multi-Scale Distillation (MSD) Loss is employed to optimize the reconstruction quality across pixel, structural, semantic, and distributional levels. Experiments on the OPV2V and DAIR-V2X datasets for LiDAR-based and camera-based perception tasks demonstrate that WaveComm maintains state-of-the-art performance even when the communication volume is reduced to 86.3% and 87.0% of the original, respectively. Compared to existing approaches, WaveComm achieves competitive improvements in both communication efficiency and perception accuracy. Ablation studies further validate the effectiveness of its key components.

WaveComm: Lightweight Communication for Collaborative Perception via Wavelet Feature Distillation

Abstract

In multi-agent collaborative sensing systems, substantial communication overhead from information exchange significantly limits scalability and real-time performance, especially in bandwidth-constrained environments. This often results in degraded performance and reduced reliability. To address this challenge, we propose WaveComm, a wavelet-based communication framework that drastically reduces transmission loads while preserving sensing performance in low-bandwidth scenarios. The core innovation of WaveComm lies in decomposing feature maps using Discrete Wavelet Transform (DWT), transmitting only compact low-frequency components to minimize communication overhead. High-frequency details are omitted, and their effects are reconstructed at the receiver side using a lightweight generator. A Multi-Scale Distillation (MSD) Loss is employed to optimize the reconstruction quality across pixel, structural, semantic, and distributional levels. Experiments on the OPV2V and DAIR-V2X datasets for LiDAR-based and camera-based perception tasks demonstrate that WaveComm maintains state-of-the-art performance even when the communication volume is reduced to 86.3% and 87.0% of the original, respectively. Compared to existing approaches, WaveComm achieves competitive improvements in both communication efficiency and perception accuracy. Ablation studies further validate the effectiveness of its key components.
Paper Structure (16 sections, 8 equations, 4 figures, 4 tables)

This paper contains 16 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Feature transmission methods in collaborative perception. (a) Spatial domain methods: The sender vehicle encodes data, applies compression or selection, and transmits it to the receiver vehicle, which performs feature fusion and decoding. (b) Frequency domain methods: The sender encodes data into low- and high-frequency components, primarily transmitting low-frequency data. The receiver reconstructs the original features from these components, followed by feature fusion and decoding.
  • Figure 2: Overview of WaveComm. WaveComm enables efficient information exchange among intelligent agents to support collaborative autonomous driving. (a) BEV Feature Encoder, which converts agent observations into BEV feature maps. (b) Wavelet Feature Distillation, which employs DWT to decompose features into low- and high-frequency components, followed by a Wavelet Generator and Wavelet Discriminator for efficient feature reconstruction using IDWT. (c) Feature Fusion, which integrates features from multiple agents to enhance the overall feature representation effectively. (d) Detection Head, which produces final detection outputs based on the fused features.
  • Figure 3: Architecture of Wavelet Generator and Wavelet Discriminator. The Wavelet Generator uses Decoder, Upsample, and Output modules to reconstructed features $\hat{\mathcal{Z}}_i$ from the transmitted low-frequency component. The Wavelet Discriminator employs Sigmoid to generate a probability map $P$ like PatchGAN isola2017image.
  • Figure 4: Visualization of detection results on the OPV2V and DAIR-V2X datasets under both LiDAR-based and camera-based configurations. Green represents ground truth box, and Red represents predicted box.