Table of Contents
Fetching ...

STAMP: Scalable Task And Model-agnostic Collaborative Perception

Xiangbo Gao, Runsheng Xu, Jiachen Li, Ziran Wang, Zhiwen Fan, Zhengzhong Tu

TL;DR

STAMP tackles heterogeneous multi-agent perception by introducing adapters and reverters that translate local BEV features into a shared protocol domain, enabling scalable, task- and model-agnostic collaboration without sharing models. The Collaborative Feature Alignment (CFA) framework jointly trains a protocol BEV embedding and lightweight adapters/reverters, enforcing alignment in both feature space and decision space. Empirical results on OPV2V and V2V4Real show STAMP achieving comparable or superior accuracy with significantly lower per-agent training costs as the number of agents grows, and strong robustness to noise. The work demonstrates a practical path toward secure, scalable multi-agent perception and discusses multi-group collaboration as a promising direction to mitigate bottlenecks in heterogeneous CP systems.

Abstract

Perception is crucial for autonomous driving, but single-agent perception is often constrained by sensors' physical limitations, leading to degraded performance under severe occlusion, adverse weather conditions, and when detecting distant objects. Multi-agent collaborative perception offers a solution, yet challenges arise when integrating heterogeneous agents with varying model architectures. To address these challenges, we propose STAMP, a scalable task- and model-agnostic, collaborative perception pipeline for heterogeneous agents. STAMP utilizes lightweight adapter-reverter pairs to transform Bird's Eye View (BEV) features between agent-specific and shared protocol domains, enabling efficient feature sharing and fusion. This approach minimizes computational overhead, enhances scalability, and preserves model security. Experiments on simulated and real-world datasets demonstrate STAMP's comparable or superior accuracy to state-of-the-art models with significantly reduced computational costs. As a first-of-its-kind task- and model-agnostic framework, STAMP aims to advance research in scalable and secure mobility systems towards Level 5 autonomy. Our project page is at https://xiangbogaobarry.github.io/STAMP and the code is available at https://github.com/taco-group/STAMP.

STAMP: Scalable Task And Model-agnostic Collaborative Perception

TL;DR

STAMP tackles heterogeneous multi-agent perception by introducing adapters and reverters that translate local BEV features into a shared protocol domain, enabling scalable, task- and model-agnostic collaboration without sharing models. The Collaborative Feature Alignment (CFA) framework jointly trains a protocol BEV embedding and lightweight adapters/reverters, enforcing alignment in both feature space and decision space. Empirical results on OPV2V and V2V4Real show STAMP achieving comparable or superior accuracy with significantly lower per-agent training costs as the number of agents grows, and strong robustness to noise. The work demonstrates a practical path toward secure, scalable multi-agent perception and discusses multi-group collaboration as a promising direction to mitigate bottlenecks in heterogeneous CP systems.

Abstract

Perception is crucial for autonomous driving, but single-agent perception is often constrained by sensors' physical limitations, leading to degraded performance under severe occlusion, adverse weather conditions, and when detecting distant objects. Multi-agent collaborative perception offers a solution, yet challenges arise when integrating heterogeneous agents with varying model architectures. To address these challenges, we propose STAMP, a scalable task- and model-agnostic, collaborative perception pipeline for heterogeneous agents. STAMP utilizes lightweight adapter-reverter pairs to transform Bird's Eye View (BEV) features between agent-specific and shared protocol domains, enabling efficient feature sharing and fusion. This approach minimizes computational overhead, enhances scalability, and preserves model security. Experiments on simulated and real-world datasets demonstrate STAMP's comparable or superior accuracy to state-of-the-art models with significantly reduced computational costs. As a first-of-its-kind task- and model-agnostic framework, STAMP aims to advance research in scalable and secure mobility systems towards Level 5 autonomy. Our project page is at https://xiangbogaobarry.github.io/STAMP and the code is available at https://github.com/taco-group/STAMP.

Paper Structure

This paper contains 44 sections, 8 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Initially, agents are non-collaborative (I), resulting in degraded performance. Collaborative Feature Alignment (CFA) enables collaboration among heterogeneous agents through a two-step process (II): training a protocol network and training local adapters and reverters. The protocol network facilitates communication between Agent 1, Agent 2, and Agent 3, each with heterogeneous models and features. Gradient-colored feature maps represent features adapted or reverted between domains. After CFA implementation, agents become collaborative (III) with improved performance.
  • Figure 2: Training efficiency comparison of our framework and existing heterogeneous CP frameworks across a number of heterogeneous agents.
  • Figure 3: Ablation studies on the OPV2V dataset: (a) Model performance across different BEV feature channel sizes. (b) Performance comparison of various adapter and reverter architectures. (c) Performance results using different combinations of loss function components ($L_f$ and $L_d$).
  • Figure 4: Ablation studies on the V2V4real set.
  • Figure 5: Visualization of feature maps and model outputs before and after Collaborative Feature Alignment (CFA) for two scenes with different agents and tasks. For 3D object detection, green boxes indicate the ground truth labels and red boxes indicate the predictions. CFA enhances feature clarity and information preservation, resulting in improved perception accuracy across heterogeneous agents.
  • ...and 5 more figures