From Features to Reference Points: Lightweight and Adaptive Fusion for Cooperative Autonomous Driving

Yongqi Zhu; Morui Zhu; Qi Chen; Deyuan Qu; Song Fu; Qing Yang

From Features to Reference Points: Lightweight and Adaptive Fusion for Cooperative Autonomous Driving

Yongqi Zhu, Morui Zhu, Qi Chen, Deyuan Qu, Song Fu, Qing Yang

TL;DR

This work tackles bandwidth-efficient cooperative perception for autonomous driving by exchanging compact reference points rather than dense feature maps. The proposed RefPtsFusion framework uses interpretable geometric anchors (positions, velocities, sizes) and introduces Selective Top-K query fusion to augment the shared cues under varying network conditions, enabling robust cross-vehicle collaboration across heterogeneous backbones. On the M$^3$CAD dataset, RefPtsFusion achieves perception performance comparable to feature- and query-based fusion while reducing communication by over five orders of magnitude, with velocity and size cues further improving temporal and spatial consistency. The approach offers a scalable, real-time solution for cooperative driving with strong robustness and predictable communication behavior, paving the way for practical deployment in diverse vehicle fleets.

Abstract

We present RefPtsFusion, a lightweight and interpretable framework for cooperative autonomous driving. Instead of sharing large feature maps or query embeddings, vehicles exchange compact reference points, e.g., objects' positions, velocities, and size information. This approach shifts the focus from "what is seen" to "where to see", creating a sensor- and model-independent interface that works well across vehicles with heterogeneous perception models while greatly reducing communication bandwidth. To enhance the richness of shared information, we further develop a selective Top-K query fusion that selectively adds high-confidence queries from the sender. It thus achieves a strong balance between accuracy and communication cost. Experiments on the M3CAD dataset show that RefPtsFusion maintains stable perception performance while reducing communication overhead by five orders of magnitude, dropping from hundreds of MB/s to only a few KB/s at 5 FPS (frame per second), compared to traditional feature-level fusion methods. Extensive experiments also demonstrate RefPtsFusion's strong robustness and consistent transmission behavior, highlighting its potential for scalable, real-time cooperative driving systems.

From Features to Reference Points: Lightweight and Adaptive Fusion for Cooperative Autonomous Driving

TL;DR

Abstract

From Features to Reference Points: Lightweight and Adaptive Fusion for Cooperative Autonomous Driving

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)