Table of Contents
Fetching ...

One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception

Yuchen Xia, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Yang Li, Xuanhan Zhu, Tianyou Luo, Siheng Chen, Jinglin Li

TL;DR

This work tackles the challenge of immutable heterogeneous collaborative perception by introducing PolyInter, a polymorphic feature interpreter that achieves one-stage semantic alignment across diverse neighbor agents using a shared interpreter, a general prompt, and agent-specific prompts. The approach relies on two training phases: Phase I learns shared and agent-specific semantics with a channel selection and spatial attention mechanism, while Phase II enables generalization to new agents by fine-tuning only the new agent prompts and a resizer. PolyInter demonstrates superior accuracy over state-of-the-art interpreters on OPV2V and shows strong extensibility with minimal parameter updates, reducing retraining and storage needs in open multi-agent settings. The method is validated across multiple datasets and modalities, indicating practical impact for scalable, efficient collaborative perception in autonomous systems.

Abstract

Collaborative perception in autonomous driving significantly enhances the perception capabilities of individual agents. Immutable heterogeneity, where agents have different and fixed perception networks, presents a major challenge due to the semantic gap in exchanged intermediate features without modifying the perception networks. Most existing methods bridge the semantic gap through interpreters. However, they either require training a new interpreter for each new agent type, limiting extensibility, or rely on a two-stage interpretation via an intermediate standardized semantic space, causing cumulative semantic loss. To achieve both extensibility in immutable heterogeneous scenarios and low-loss feature interpretation, we propose PolyInter, a polymorphic feature interpreter. It provides an extension point where new agents integrate by overriding only their specific prompts, which are learnable parameters that guide interpretation, while reusing PolyInter's remaining parameters. By leveraging polymorphism, our design enables a single interpreter to accommodate diverse agents and interpret their features into the ego agent's semantic space. Experiments on the OPV2V dataset demonstrate that PolyInter improves collaborative perception precision by up to 11.1% compared to SOTA interpreters, while comparable results can be achieved by training only 1.4% of PolyInter's parameters when adapting to new agents. Code is available at https://github.com/yuchen-xia/PolyInter.

One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception

TL;DR

This work tackles the challenge of immutable heterogeneous collaborative perception by introducing PolyInter, a polymorphic feature interpreter that achieves one-stage semantic alignment across diverse neighbor agents using a shared interpreter, a general prompt, and agent-specific prompts. The approach relies on two training phases: Phase I learns shared and agent-specific semantics with a channel selection and spatial attention mechanism, while Phase II enables generalization to new agents by fine-tuning only the new agent prompts and a resizer. PolyInter demonstrates superior accuracy over state-of-the-art interpreters on OPV2V and shows strong extensibility with minimal parameter updates, reducing retraining and storage needs in open multi-agent settings. The method is validated across multiple datasets and modalities, indicating practical impact for scalable, efficient collaborative perception in autonomous systems.

Abstract

Collaborative perception in autonomous driving significantly enhances the perception capabilities of individual agents. Immutable heterogeneity, where agents have different and fixed perception networks, presents a major challenge due to the semantic gap in exchanged intermediate features without modifying the perception networks. Most existing methods bridge the semantic gap through interpreters. However, they either require training a new interpreter for each new agent type, limiting extensibility, or rely on a two-stage interpretation via an intermediate standardized semantic space, causing cumulative semantic loss. To achieve both extensibility in immutable heterogeneous scenarios and low-loss feature interpretation, we propose PolyInter, a polymorphic feature interpreter. It provides an extension point where new agents integrate by overriding only their specific prompts, which are learnable parameters that guide interpretation, while reusing PolyInter's remaining parameters. By leveraging polymorphism, our design enables a single interpreter to accommodate diverse agents and interpret their features into the ego agent's semantic space. Experiments on the OPV2V dataset demonstrate that PolyInter improves collaborative perception precision by up to 11.1% compared to SOTA interpreters, while comparable results can be achieved by training only 1.4% of PolyInter's parameters when adapting to new agents. Code is available at https://github.com/yuchen-xia/PolyInter.

Paper Structure

This paper contains 29 sections, 11 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: Comparison of different immutable heterogeneous collaborative strategies for extending collaboration with new neighbor agents. (a) interprets neighbor features directly into the ego agent’s semantic space in a one-stage interpretation. (b) requires a two-stage feature interpretation for each collaboration, using a standard semantic space. (c) leverages a polymorphic feature interpreter, requiring only prompt tuning for each new neighbor agent. The blue areas are on the ego agent, while the yellow areas are on the neighbor agents.
  • Figure 1: Visualization of the ego feature, the general prompt, the specific prompts corresponding to different neighbor agents, and the process of interpreting neighbor features into the ego agent's semantic space.
  • Figure 2: The overall architecture of PolyInter. PolyInter establishes a common structure that can be inherited by multiple agents, providing an extension point for customizing the specific prompts of each agent. This interpreter incorporates both a Channel Selection Module and a Spatial Attention Module to facilitate feature semantic interpretation.
  • Figure 3: Visualization of BEV feature maps from two heterogeneous encoders along the corresponding channel dimension: PointPillar DBLP:conf/cvpr/LangVCZYB19 on the left and VoxelNet DBLP:conf/cvpr/ZhouT18 on the right. The numbers in the middle represent channel-wise similarity.
  • Figure 4: Comparison of the number of trainable parameters with PnPDA and MPDA when incrementally adding new heterogeneous neighbor agents.
  • ...and 2 more figures