Table of Contents
Fetching ...

NegoCollab: A Common Representation Negotiation Approach for Heterogeneous Collaborative Perception

Congzhang Shao, Quan Yuan, Guiyang Luo, Yue Hu, Danni Wang, Yilin Liu, Rui Pan, Bo Chen, Jinglin Li

TL;DR

This work tackles the challenge of immutable heterogeneity in collaborative perception by introducing NegoCollab, which negotiates a multimodal common representation during training using a negotiator. Each agent employs a plug‑and‑play sender–receiver pair to exchange features through this common space, with a cyclic distribution consistency loss ensuring bidirectional information preservation. A multi‑dimensional alignment loss—comprising distribution, structural, and pragmatic components—supervises the alignment of local representations to the common representation, enabling effective knowledge distillation into senders. Empirical results on OPV2V-H, V2V4Real, and DAIR-V2X demonstrate state‑of‑the‑art performance among common‑representation methods and competitive results against one‑to‑one adaptation, while showing robustness to localization noise and flexible integration of new agents.

Abstract

Collaborative perception improves task performance by expanding the perception range through information sharing among agents. . Immutable heterogeneity poses a significant challenge in collaborative perception, as participating agents may employ different and fixed perception models. This leads to domain gaps in the intermediate features shared among agents, consequently degrading collaborative performance. Aligning the features of all agents to a common representation can eliminate domain gaps with low training cost. However, in existing methods, the common representation is designated as the representation of a specific agent, making it difficult for agents with significant domain discrepancies from this specific agent to achieve proper alignment. This paper proposes NegoCollab, a heterogeneous collaboration method based on the negotiated common representation. It introduces a negotiator during training to derive the common representation from the local representations of each modality's agent, effectively reducing the inherent domain gap with the various local representations. In NegoCollab, the mutual transformation of features between the local representation space and the common representation space is achieved by a pair of sender and receiver. To better align local representations to the common representation containing multimodal information, we introduce structural alignment loss and pragmatic alignment loss in addition to the distribution alignment loss to supervise the training. This enables the knowledge in the common representation to be fully distilled into the sender.

NegoCollab: A Common Representation Negotiation Approach for Heterogeneous Collaborative Perception

TL;DR

This work tackles the challenge of immutable heterogeneity in collaborative perception by introducing NegoCollab, which negotiates a multimodal common representation during training using a negotiator. Each agent employs a plug‑and‑play sender–receiver pair to exchange features through this common space, with a cyclic distribution consistency loss ensuring bidirectional information preservation. A multi‑dimensional alignment loss—comprising distribution, structural, and pragmatic components—supervises the alignment of local representations to the common representation, enabling effective knowledge distillation into senders. Empirical results on OPV2V-H, V2V4Real, and DAIR-V2X demonstrate state‑of‑the‑art performance among common‑representation methods and competitive results against one‑to‑one adaptation, while showing robustness to localization noise and flexible integration of new agents.

Abstract

Collaborative perception improves task performance by expanding the perception range through information sharing among agents. . Immutable heterogeneity poses a significant challenge in collaborative perception, as participating agents may employ different and fixed perception models. This leads to domain gaps in the intermediate features shared among agents, consequently degrading collaborative performance. Aligning the features of all agents to a common representation can eliminate domain gaps with low training cost. However, in existing methods, the common representation is designated as the representation of a specific agent, making it difficult for agents with significant domain discrepancies from this specific agent to achieve proper alignment. This paper proposes NegoCollab, a heterogeneous collaboration method based on the negotiated common representation. It introduces a negotiator during training to derive the common representation from the local representations of each modality's agent, effectively reducing the inherent domain gap with the various local representations. In NegoCollab, the mutual transformation of features between the local representation space and the common representation space is achieved by a pair of sender and receiver. To better align local representations to the common representation containing multimodal information, we introduce structural alignment loss and pragmatic alignment loss in addition to the distribution alignment loss to supervise the training. This enables the knowledge in the common representation to be fully distilled into the sender.

Paper Structure

This paper contains 31 sections, 15 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Two paradigms for eliminating domain gaps. The method in (a) eliminates the domain gap by adapting domain adaptation modules between every pair of collaborating agents. The methods in (b) and (c) both eliminate domain gaps by unifying the representations of each agent into the common representation, where the common representation in (b) is designated as the local representation of a specific agent, and the common representation in (c) is negotiated from the local representations of each modality’s agent.
  • Figure 2: Overview of NegoCollab. Each agent shares features in the negotiated common representation space. Through the sender-receiver pairs, the features are mutually converted between local representation space and the common representation space, thereby enabling the mutual transformation of features across modalities and eliminating domain gaps.
  • Figure 3: Robustness Analysis of Localization Errors. Pose noise is set to $\mathcal{N} \left( 0,\sigma ^2 \right)$ on both x,y location and yaw angle. The collaborating agents are m1 and m2.
  • Figure 4: Comparison of domain gaps between local and common representation.
  • Figure 5: Training process of initial alliance negotiation.
  • ...and 3 more figures