Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification
Zhenyu Kuang, Hongyang Zhang, Mang Ye, Bin Yang, Yinhao Liu, Yue Huang, Xinghao Ding, Huafeng Li
TL;DR
The paper tackles domain generalization in vehicle re-identification by addressing domain-related redundancy in source images that hinder learning. It introduces MiKeCoCo, a two-stage CLIP-based framework that first uses STREAM to produce domain-invariant and style-perturbed inputs, then learns diversified prompts via Multi-expert Knowledge Adversarial Learning (MEKA) and fuses them through a Mixture of Experts (MoE) module with knowledge distillation. The approach yields complementary, high-level semantic features and robust cross-domain identity predictions, achieving state-of-the-art results on multiple vehicle ReID benchmarks. The work demonstrates that combining input-level redundancy elimination with multi-view expert collaboration can significantly improve generalization under domain shifts with practical training and inference efficiency.
Abstract
Generalizable vehicle re-identification (ReID) seeks to develop models that can adapt to unknown target domains without the need for additional fine-tuning or retraining. Previous works have mainly focused on extracting domain-invariant features by aligning data distributions between source domains. However, interfered by the inherent domain-related redundancy in the source images, solely relying on common features is insufficient for accurately capturing the complementary features with lower occurrence probability and smaller energy. To solve this unique problem, we propose a two-stage Multi-expert Knowledge Confrontation and Collaboration (MiKeCoCo) method, which fully leverages the high-level semantics of Contrastive Language-Image Pretraining (CLIP) to obtain a diversified prompt set and achieve complementary feature representations. Specifically, this paper first designs a Spectrum-based Transformation for Redundancy Elimination and Augmentation Module (STREAM) through simple image preprocessing to obtain two types of image inputs for the training process. Since STREAM eliminates domain-related redundancy in source images, it enables the model to pay closer attention to the detailed prompt set that is crucial for distinguishing fine-grained vehicles. This learned prompt set related to the vehicle identity is then utilized to guide the comprehensive representation learning of complementary features for final knowledge fusion and identity recognition. Inspired by the unity principle, MiKeCoCo integrates the diverse evaluation ways of experts to ensure the accuracy and consistency of ReID. Extensive experimental results demonstrate that our method achieves state-of-the-art performance.
