Reliable Multi-Modal Object Re-Identification via Modality-Aware Graph Reasoning
Xixi Wan, Aihua Zheng, Zi Wang, Bo Jiang, Jin Tang, Jixin Ma
TL;DR
This work tackles multi-modal object ReID under varying local-feature quality and common missing-modality scenarios. It introduces the Modality-aware Graph Reasoning Network (MGRNet), integrating modality-aware patch graphs, Selective Graph Nodes Swap (SGNS), and two graph-reasoning modules—GRMI for modal interaction and GRMM for missing-modality reconstruction—alongside global-aware multi-head attention for robust fusion. The model is trained with a multi-modality reconstruction loss that jointly constrains feature and structure consistency, achieving state-of-the-art results on four benchmarks (RGBNT201, Market1501-MM, RGBNT100, MSVR310). Empirical analyses reveal that SGNS effectively mitigates low-quality local features, GRMM recovers missing modalities via structural cues, and GRMI enhances cross-modal information exchange, yielding strong performance in both complete and incomplete modality settings with practical significance for real-world ReID systems.
Abstract
Multi-modal data provides abundant and diverse object information, crucial for effective modal interactions in Re-Identification (ReID) tasks. However, existing approaches often overlook the quality variations in local features and fail to fully leverage the complementary information across modalities, particularly in the case of low-quality features. In this paper, we propose to address this issue by leveraging a novel graph reasoning model, termed the Modality-aware Graph Reasoning Network (MGRNet). Specifically, we first construct modality-aware graphs to enhance the extraction of fine-grained local details by effectively capturing and modeling the relationships between patches. Subsequently, the selective graph nodes swap operation is employed to alleviate the adverse effects of low-quality local features by considering both local and global information, enhancing the representation of discriminative information. Finally, the swapped modality-aware graphs are fed into the local-aware graph reasoning module, which propagates multi-modal information to yield a reliable feature representation. Another advantage of the proposed graph reasoning approach is its ability to reconstruct missing modal information by exploiting inherent structural relationships, thereby minimizing disparities between different modalities. Experimental results on four benchmarks (RGBNT201, Market1501-MM, RGBNT100, MSVR310) indicate that the proposed method achieves state-of-the-art performance in multi-modal object ReID. The code for our method will be available upon acceptance.
