CoSense3D: an Agent-based Efficient Learning Framework for Collective Perception

Yunshuang Yuan; Monika Sester

CoSense3D: an Agent-based Efficient Learning Framework for Collective Perception

Yunshuang Yuan, Monika Sester

TL;DR

CoSense3D introduces an agent-based training framework to tackle the heavy data and gradient-load of multi-agent collective perception. By formalizing per-CAV roles and enabling per-agent gradient control within a GPU-accelerated central pipeline, the method significantly reduces training memory and time while preserving inference accuracy on the OPV2V benchmark, as demonstrated across AttnFusion, FPVRCNN, F-Cooper, and EviBEV. Key findings show that using gradient-aware fusion modules without dropping learnable features yields substantial efficiency gains (e.g., memory reductions around 56% for dense fusion) with minimal AP loss, whereas non-learnable or naive fusion can incur performance drops. This work offers a practical path to scalable development of cooperative perception systems for autonomous driving, supported by an open-source framework and demonstration on a representative dataset.

Abstract

Collective Perception has attracted significant attention in recent years due to its advantage for mitigating occlusion and expanding the field-of-view, thereby enhancing reliability, efficiency, and, most crucially, decision-making safety. However, developing collective perception models is highly resource demanding due to extensive requirements of processing input data for many agents, usually dozens of images and point clouds for a single frame. This not only slows down the model development process for collective perception but also impedes the utilization of larger models. In this paper, we propose an agent-based training framework that handles the deep learning modules and agent data separately to have a cleaner data flow structure. This framework not only provides an API for flexibly prototyping the data processing pipeline and defining the gradient calculation for each agent, but also provides the user interface for interactive training, testing and data visualization. Training experiment results of four collective object detection models on the prominent collective perception benchmark OPV2V show that the agent-based training can significantly reduce the GPU memory consumption and training time while retaining inference performance. The framework and model implementations are available at \url{https://github.com/YuanYunshuang/CoSense3D}

CoSense3D: an Agent-based Efficient Learning Framework for Collective Perception

TL;DR

Abstract

Paper Structure (12 sections, 5 figures, 1 table)

This paper contains 12 sections, 5 figures, 1 table.

Introduction
CoSense3D Framework
Formalization of Collective Perception
Framework Structure
Experiments
State-of-the-art Networks
Dataset
Experiment Settings
Result and Evaluation
Average Precision (AP) of Object detection
Efficiency of Agent-based Learning
Conclusions

Figures (5)

Figure 1: CoSense3D: Agent-based training framework. Black arrows indicate the instruction passing direction, green arrows show the data passing direction.
Figure 2: Workflow of CoSense3D Central Controller.
Figure 3: CoSense3D GUI.
Figure 4: Collective perception pipeline with agent-based training. Blue and gray blocks show the data flow with and without gradient calculation, respectively. Sketched blocks with blues strips are the shared deep learning models.
Figure 5: Correlation between object detection performance and GPU memory usage (left) as well as training time (right) with different gradient configurations.

CoSense3D: an Agent-based Efficient Learning Framework for Collective Perception

TL;DR

Abstract

CoSense3D: an Agent-based Efficient Learning Framework for Collective Perception

Authors

TL;DR

Abstract

Table of Contents

Figures (5)