CoMamba: Real-time Cooperative Perception Unlocked with State Space Models
Jinlong Li, Xinyu Liu, Baolu Li, Runsheng Xu, Jiachen Li, Hongkai Yu, Zhengzhong Tu
TL;DR
The paper tackles real-time cooperative perception for V2X systems by addressing the scalability and latency limitations of transformer-based fusion. It introduces CoMamba, a linear-complexity framework based on state-space models (Mamba) with two novel modules, CSS2D and GPM, to enable efficient 3D feature fusion across multiple connected agents. Empirical results on OPV2V, V2XSet, and V2V4Real show state-of-the-art detection accuracy with real-time performance (26.9 FPS) and linear scaling with increasing agents. This work demonstrates the practicality of SSM-based backbones for large-scale, onboard cooperative perception in intelligent transportation networks.
Abstract
Cooperative perception systems play a vital role in enhancing the safety and efficiency of vehicular autonomy. Although recent studies have highlighted the efficacy of vehicle-to-everything (V2X) communication techniques in autonomous driving, a significant challenge persists: how to efficiently integrate multiple high-bandwidth features across an expanding network of connected agents such as vehicles and infrastructure. In this paper, we introduce CoMamba, a novel cooperative 3D detection framework designed to leverage state-space models for real-time onboard vehicle perception. Compared to prior state-of-the-art transformer-based models, CoMamba enjoys being a more scalable 3D model using bidirectional state space models, bypassing the quadratic complexity pain-point of attention mechanisms. Through extensive experimentation on V2X/V2V datasets, CoMamba achieves superior performance compared to existing methods while maintaining real-time processing capabilities. The proposed framework not only enhances object detection accuracy but also significantly reduces processing time, making it a promising solution for next-generation cooperative perception systems in intelligent transportation networks.
