Residual Vector Quantization For Communication-Efficient Multi-Agent Perception
Dereje Shenkut, B. V. K Vijaya Kumar
TL;DR
The paper tackles the bandwidth bottleneck in multi-agent collaborative perception by introducing ReVQom, a learned feature codec that preserves spatial geometry while compressing BEV features. It achieves this with a simple channel-reducing bottleneck followed by multi-stage residual vector quantization using shared codebooks, transmitting only per-pixel indices at a rate $R = n_q \log_2 K$ per location. Empirical results on DAIR-V2X and OPV2V show dramatic compression (up to $1365\times$) with competitive detection performance and graceful degradation at ultra-low bitrates, enabling practical V2X deployment. This work demonstrates that aggressive, index-based compression can maintain BEV fusion quality and scalability in real-world CP scenarios.
Abstract
Multi-agent collaborative perception (CP) improves scene understanding by sharing information across connected agents such as autonomous vehicles, unmanned aerial vehicles, and robots. Communication bandwidth, however, constrains scalability. We present ReVQom, a learned feature codec that preserves spatial identity while compressing intermediate features. ReVQom is an end-to-end method that compresses feature dimensions via a simple bottleneck network followed by multi-stage residual vector quantization (RVQ). This allows only per-pixel code indices to be transmitted, reducing payloads from 8192 bits per pixel (bpp) of uncompressed 32-bit float features to 6-30 bpp per agent with minimal accuracy loss. On DAIR-V2X real-world CP dataset, ReVQom achieves 273x compression at 30 bpp to 1365x compression at 6 bpp. At 18 bpp (455x), ReVQom matches or outperforms raw-feature CP, and at 6-12 bpp it enables ultra-low-bandwidth operation with graceful degradation. ReVQom allows efficient and accurate multi-agent collaborative perception with a step toward practical V2X deployment.
