Table of Contents
Fetching ...

Reliable Multi-Modal Object Re-Identification via Modality-Aware Graph Reasoning

Xixi Wan, Aihua Zheng, Zi Wang, Bo Jiang, Jin Tang, Jixin Ma

TL;DR

This work tackles multi-modal object ReID under varying local-feature quality and common missing-modality scenarios. It introduces the Modality-aware Graph Reasoning Network (MGRNet), integrating modality-aware patch graphs, Selective Graph Nodes Swap (SGNS), and two graph-reasoning modules—GRMI for modal interaction and GRMM for missing-modality reconstruction—alongside global-aware multi-head attention for robust fusion. The model is trained with a multi-modality reconstruction loss that jointly constrains feature and structure consistency, achieving state-of-the-art results on four benchmarks (RGBNT201, Market1501-MM, RGBNT100, MSVR310). Empirical analyses reveal that SGNS effectively mitigates low-quality local features, GRMM recovers missing modalities via structural cues, and GRMI enhances cross-modal information exchange, yielding strong performance in both complete and incomplete modality settings with practical significance for real-world ReID systems.

Abstract

Multi-modal data provides abundant and diverse object information, crucial for effective modal interactions in Re-Identification (ReID) tasks. However, existing approaches often overlook the quality variations in local features and fail to fully leverage the complementary information across modalities, particularly in the case of low-quality features. In this paper, we propose to address this issue by leveraging a novel graph reasoning model, termed the Modality-aware Graph Reasoning Network (MGRNet). Specifically, we first construct modality-aware graphs to enhance the extraction of fine-grained local details by effectively capturing and modeling the relationships between patches. Subsequently, the selective graph nodes swap operation is employed to alleviate the adverse effects of low-quality local features by considering both local and global information, enhancing the representation of discriminative information. Finally, the swapped modality-aware graphs are fed into the local-aware graph reasoning module, which propagates multi-modal information to yield a reliable feature representation. Another advantage of the proposed graph reasoning approach is its ability to reconstruct missing modal information by exploiting inherent structural relationships, thereby minimizing disparities between different modalities. Experimental results on four benchmarks (RGBNT201, Market1501-MM, RGBNT100, MSVR310) indicate that the proposed method achieves state-of-the-art performance in multi-modal object ReID. The code for our method will be available upon acceptance.

Reliable Multi-Modal Object Re-Identification via Modality-Aware Graph Reasoning

TL;DR

This work tackles multi-modal object ReID under varying local-feature quality and common missing-modality scenarios. It introduces the Modality-aware Graph Reasoning Network (MGRNet), integrating modality-aware patch graphs, Selective Graph Nodes Swap (SGNS), and two graph-reasoning modules—GRMI for modal interaction and GRMM for missing-modality reconstruction—alongside global-aware multi-head attention for robust fusion. The model is trained with a multi-modality reconstruction loss that jointly constrains feature and structure consistency, achieving state-of-the-art results on four benchmarks (RGBNT201, Market1501-MM, RGBNT100, MSVR310). Empirical analyses reveal that SGNS effectively mitigates low-quality local features, GRMM recovers missing modalities via structural cues, and GRMI enhances cross-modal information exchange, yielding strong performance in both complete and incomplete modality settings with practical significance for real-world ReID systems.

Abstract

Multi-modal data provides abundant and diverse object information, crucial for effective modal interactions in Re-Identification (ReID) tasks. However, existing approaches often overlook the quality variations in local features and fail to fully leverage the complementary information across modalities, particularly in the case of low-quality features. In this paper, we propose to address this issue by leveraging a novel graph reasoning model, termed the Modality-aware Graph Reasoning Network (MGRNet). Specifically, we first construct modality-aware graphs to enhance the extraction of fine-grained local details by effectively capturing and modeling the relationships between patches. Subsequently, the selective graph nodes swap operation is employed to alleviate the adverse effects of low-quality local features by considering both local and global information, enhancing the representation of discriminative information. Finally, the swapped modality-aware graphs are fed into the local-aware graph reasoning module, which propagates multi-modal information to yield a reliable feature representation. Another advantage of the proposed graph reasoning approach is its ability to reconstruct missing modal information by exploiting inherent structural relationships, thereby minimizing disparities between different modalities. Experimental results on four benchmarks (RGBNT201, Market1501-MM, RGBNT100, MSVR310) indicate that the proposed method achieves state-of-the-art performance in multi-modal object ReID. The code for our method will be available upon acceptance.

Paper Structure

This paper contains 22 sections, 22 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: (a) Due to quality differences in local features among modalities, we first split the data to obtain more detailed local information, and then perform an information swap (achieved by GRMI described in Sec \ref{['sec:GRMI']}). (b) When TIR is missing in the testing phase, we leverage graph reasoning trained with constraints from modality and structural information to restore features, combining existing RGB and NIR features (nodes) and their relationships (edges), achieved by GRMM described in Sec \ref{['sec:GRMM']}.
  • Figure 2: The overall network structure of the proposed MRGNet. For complete multi-modal training and testing, initial feature extraction first employs the multi-branch vision encoders on the multi-modal images to obtain the initial features. Secondly, graph reasoning on modal interaction is employed to alleviate low-quality tokens of each modality. Finally, the enhanced features are generated with global-aware multi-head attention and the fused features are fed into the classifiers to get the ReID results. Furthermore, graph reasoning on missing modality strategy is designed to restore features based on their structural relationships for the missing modality problems.
  • Figure 3: The process of multiplying selective graph nodes swap by considering both local and global information.
  • Figure 4: The intra-class and inter-class distances of cross-modality features of different methods.
  • Figure 5: Comparison results for GNN methods on the common dataset.
  • ...and 4 more figures