Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification

Zhenyu Kuang; Hongyang Zhang; Mang Ye; Bin Yang; Yinhao Liu; Yue Huang; Xinghao Ding; Huafeng Li

Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification

Zhenyu Kuang, Hongyang Zhang, Mang Ye, Bin Yang, Yinhao Liu, Yue Huang, Xinghao Ding, Huafeng Li

TL;DR

The paper tackles domain generalization in vehicle re-identification by addressing domain-related redundancy in source images that hinder learning. It introduces MiKeCoCo, a two-stage CLIP-based framework that first uses STREAM to produce domain-invariant and style-perturbed inputs, then learns diversified prompts via Multi-expert Knowledge Adversarial Learning (MEKA) and fuses them through a Mixture of Experts (MoE) module with knowledge distillation. The approach yields complementary, high-level semantic features and robust cross-domain identity predictions, achieving state-of-the-art results on multiple vehicle ReID benchmarks. The work demonstrates that combining input-level redundancy elimination with multi-view expert collaboration can significantly improve generalization under domain shifts with practical training and inference efficiency.

Abstract

Generalizable vehicle re-identification (ReID) seeks to develop models that can adapt to unknown target domains without the need for additional fine-tuning or retraining. Previous works have mainly focused on extracting domain-invariant features by aligning data distributions between source domains. However, interfered by the inherent domain-related redundancy in the source images, solely relying on common features is insufficient for accurately capturing the complementary features with lower occurrence probability and smaller energy. To solve this unique problem, we propose a two-stage Multi-expert Knowledge Confrontation and Collaboration (MiKeCoCo) method, which fully leverages the high-level semantics of Contrastive Language-Image Pretraining (CLIP) to obtain a diversified prompt set and achieve complementary feature representations. Specifically, this paper first designs a Spectrum-based Transformation for Redundancy Elimination and Augmentation Module (STREAM) through simple image preprocessing to obtain two types of image inputs for the training process. Since STREAM eliminates domain-related redundancy in source images, it enables the model to pay closer attention to the detailed prompt set that is crucial for distinguishing fine-grained vehicles. This learned prompt set related to the vehicle identity is then utilized to guide the comprehensive representation learning of complementary features for final knowledge fusion and identity recognition. Inspired by the unity principle, MiKeCoCo integrates the diverse evaluation ways of experts to ensure the accuracy and consistency of ReID. Extensive experimental results demonstrate that our method achieves state-of-the-art performance.

Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification

TL;DR

Abstract

Paper Structure (16 sections, 12 equations, 6 figures, 5 tables)

This paper contains 16 sections, 12 equations, 6 figures, 5 tables.

Introduction
Related work
Domain Generalization for ReID
Visual-Language Learning
Proposed method
Problem Statement and Overview
Spectrum-based Transformation for Redundancy Elimination and Augmentation Module
Multi-expert Knowledge Adversarial Learning
Mixture of Experts
Model training and inference
Experiment
Datasets and Evaluation Protocol
Implementation Details
Comparison with State-of-the-Art Methods
Ablation Studies and Analysis
...and 1 more sections

Figures (6)

Figure 1: Backpropagation-based visualizations r95 of saliency map of CLIP-ReID and our method. CLIP-ReID method of adapting CLIP to single-view prompt learning has not yet eliminated the domain-related redundancy of source images, which restricts its diversified feature extraction. Through domain-related redundancy elimination and multi-view adversarial training, our method effectively integrates the complementary features of different high-level semantics that are of concern to three different experts.
Figure 2: Schematic diagram of STREAM. Inspired by the conclusion of reference yang2020fdas3, the extremely high and low frequency component of an image contains primary domain-related information. $DCT(\mathbin{\vcenter{\hbox{$\m@th\bullet$}}})$ and $DCT^{-1}(\mathbin{\vcenter{\hbox{$\m@th\bullet$}}})$ represent the forward and inverse discrete cosine transforms, respectively. Domain-invariant image and style perturbation image are used respectively for the input images of the two-stage training of MiKeCoCo.
Figure 3: STREAM cleverly provides the image components necessary for each of the two training stages. Since the redundant components of the source image are blocked before the first stage, domain-invariant images could be used for multi-view prompt learning inspired by adversarial training to obtain learnable prompt set related to the vehicle identity. This learned prompt set and style perturbation images are combined to achieve robust and discriminative feature representation of image encoder for the second stage. $L$ indicates the length of the learnable prompt set.
Figure 4: Overview of the first training stage of MiKeCoCo. Since DII does not contain domain-related redundancy, it can enable the model to pay closer attention to the complementary features that are crucial for distinguishing fine-grained vehicles. In this stage, the image and text encoder parameters are fixed, and the goal is to obtain a diversified prompt set by updating the MEKA module parameters.
Figure 5: Overview of the second training stage of MiKeCoCo. Since SPI enhances the diversity of source domain dataset, it enables the image encoder of CLIP to extract more robust feature representations. In this stage, while multiple experts employ different assessment ways to verify the same vehicle, their common goal is to confirm the vehicle's true identity. By combining their expertise, their collective decision can enable the model to ensure the accuracy and consistency of the evaluation results.
...and 1 more figures

Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification

TL;DR

Abstract

Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)