DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification
Yuhao Wang, Yang Liu, Aihua Zheng, Pingping Zhang
TL;DR
DeMo tackles multi-modal object ReID under dynamic imaging quality by decoupling modality-specific and shared information and weighting decoupled features with an attention-guided mixture of experts. The approach combines Patch-Integrated Feature Extraction, hierarchical cross-modal decoupling, and attention-driven expert weighting to yield robust, adaptable representations across RGB, NIR, and TIR modalities. Empirical results on three benchmarks show strong performance and robustness to missing modalities, with comprehensive ablations and visualizations confirming each component's contribution. The work advances multi-modal ReID by integrating decoupled feature design with MoE and attention mechanisms, enabling reliable perception in challenging, modality-heterogeneous environments.
Abstract
Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by combining complementary information from multiple modalities. Existing multi-modal object ReID methods primarily focus on the fusion of heterogeneous features. However, they often overlook the dynamic quality changes in multi-modal imaging. In addition, the shared information between different modalities can weaken modality-specific information. To address these issues, we propose a novel feature learning framework called DeMo for multi-modal object ReID, which adaptively balances decoupled features using a mixture of experts. To be specific, we first deploy a Patch-Integrated Feature Extractor (PIFE) to extract multi-granularity and multi-modal features. Then, we introduce a Hierarchical Decoupling Module (HDM) to decouple multi-modal features into non-overlapping forms, preserving the modality uniqueness and increasing the feature diversity. Finally, we propose an Attention-Triggered Mixture of Experts (ATMoE), which replaces traditional gating with dynamic attention weights derived from decoupled features. With these modules, our DeMo can generate more robust multi-modal features. Extensive experiments on three multi-modal object ReID benchmarks fully verify the effectiveness of our methods. The source code is available at https://github.com/924973292/DeMo.
