NegoCollab: A Common Representation Negotiation Approach for Heterogeneous Collaborative Perception
Congzhang Shao, Quan Yuan, Guiyang Luo, Yue Hu, Danni Wang, Yilin Liu, Rui Pan, Bo Chen, Jinglin Li
TL;DR
This work tackles the challenge of immutable heterogeneity in collaborative perception by introducing NegoCollab, which negotiates a multimodal common representation during training using a negotiator. Each agent employs a plug‑and‑play sender–receiver pair to exchange features through this common space, with a cyclic distribution consistency loss ensuring bidirectional information preservation. A multi‑dimensional alignment loss—comprising distribution, structural, and pragmatic components—supervises the alignment of local representations to the common representation, enabling effective knowledge distillation into senders. Empirical results on OPV2V-H, V2V4Real, and DAIR-V2X demonstrate state‑of‑the‑art performance among common‑representation methods and competitive results against one‑to‑one adaptation, while showing robustness to localization noise and flexible integration of new agents.
Abstract
Collaborative perception improves task performance by expanding the perception range through information sharing among agents. . Immutable heterogeneity poses a significant challenge in collaborative perception, as participating agents may employ different and fixed perception models. This leads to domain gaps in the intermediate features shared among agents, consequently degrading collaborative performance. Aligning the features of all agents to a common representation can eliminate domain gaps with low training cost. However, in existing methods, the common representation is designated as the representation of a specific agent, making it difficult for agents with significant domain discrepancies from this specific agent to achieve proper alignment. This paper proposes NegoCollab, a heterogeneous collaboration method based on the negotiated common representation. It introduces a negotiator during training to derive the common representation from the local representations of each modality's agent, effectively reducing the inherent domain gap with the various local representations. In NegoCollab, the mutual transformation of features between the local representation space and the common representation space is achieved by a pair of sender and receiver. To better align local representations to the common representation containing multimodal information, we introduce structural alignment loss and pragmatic alignment loss in addition to the distribution alignment loss to supervise the training. This enables the knowledge in the common representation to be fully distilled into the sender.
