Table of Contents
Fetching ...

Distribution-Consistency-Guided Multi-modal Hashing

Jin-Yu Liu, Xian-Ling Mao, Tian-Yi Che, Rong-Cheng Tu

TL;DR

This work addresses robust multi-modal retrieval when training labels are noisy. It introduces DCGMH, which leverages a distribution-consistency pattern between the 1-0 label distribution and hash-code similarity to category centers to filter and reconstruct noisy labels. The method partitions data into clean, corrected, and unlabeled subsets, applying a multi-term objective that combines pointwise, pairwise, unsupervised, center, and quantization losses to learn high-quality hash codes. Extensive experiments on MIR Flickr, NUS-WIDE, and MS COCO show DCGMH consistently outperforms state-of-the-art baselines under noisy-label settings, validating its robustness and practical impact for real-world retrieval tasks.

Abstract

Multi-modal hashing methods have gained popularity due to their fast speed and low storage requirements. Among them, the supervised methods demonstrate better performance by utilizing labels as supervisory signals compared with unsupervised methods. Currently, for almost all supervised multi-modal hashing methods, there is a hidden assumption that training sets have no noisy labels. However, labels are often annotated incorrectly due to manual labeling in real-world scenarios, which will greatly harm the retrieval performance. To address this issue, we first discover a significant distribution consistency pattern through experiments, i.e., the 1-0 distribution of the presence or absence of each category in the label is consistent with the high-low distribution of similarity scores of the hash codes relative to category centers. Then, inspired by this pattern, we propose a novel Distribution-Consistency-Guided Multi-modal Hashing (DCGMH), which aims to filter and reconstruct noisy labels to enhance retrieval performance. Specifically, the proposed method first randomly initializes several category centers, which are used to compute the high-low distribution of similarity scores; Noisy and clean labels are then separately filtered out via the discovered distribution consistency pattern to mitigate the impact of noisy labels; Subsequently, a correction strategy, which is indirectly designed via the distribution consistency pattern, is applied to the filtered noisy labels, correcting high-confidence ones while treating low-confidence ones as unlabeled for unsupervised learning, thereby further enhancing the model's performance. Extensive experiments on three widely used datasets demonstrate the superiority of the proposed method compared to state-of-the-art baselines in multi-modal retrieval tasks. The code is available at https://github.com/LiuJinyu1229/DCGMH.

Distribution-Consistency-Guided Multi-modal Hashing

TL;DR

This work addresses robust multi-modal retrieval when training labels are noisy. It introduces DCGMH, which leverages a distribution-consistency pattern between the 1-0 label distribution and hash-code similarity to category centers to filter and reconstruct noisy labels. The method partitions data into clean, corrected, and unlabeled subsets, applying a multi-term objective that combines pointwise, pairwise, unsupervised, center, and quantization losses to learn high-quality hash codes. Extensive experiments on MIR Flickr, NUS-WIDE, and MS COCO show DCGMH consistently outperforms state-of-the-art baselines under noisy-label settings, validating its robustness and practical impact for real-world retrieval tasks.

Abstract

Multi-modal hashing methods have gained popularity due to their fast speed and low storage requirements. Among them, the supervised methods demonstrate better performance by utilizing labels as supervisory signals compared with unsupervised methods. Currently, for almost all supervised multi-modal hashing methods, there is a hidden assumption that training sets have no noisy labels. However, labels are often annotated incorrectly due to manual labeling in real-world scenarios, which will greatly harm the retrieval performance. To address this issue, we first discover a significant distribution consistency pattern through experiments, i.e., the 1-0 distribution of the presence or absence of each category in the label is consistent with the high-low distribution of similarity scores of the hash codes relative to category centers. Then, inspired by this pattern, we propose a novel Distribution-Consistency-Guided Multi-modal Hashing (DCGMH), which aims to filter and reconstruct noisy labels to enhance retrieval performance. Specifically, the proposed method first randomly initializes several category centers, which are used to compute the high-low distribution of similarity scores; Noisy and clean labels are then separately filtered out via the discovered distribution consistency pattern to mitigate the impact of noisy labels; Subsequently, a correction strategy, which is indirectly designed via the distribution consistency pattern, is applied to the filtered noisy labels, correcting high-confidence ones while treating low-confidence ones as unlabeled for unsupervised learning, thereby further enhancing the model's performance. Extensive experiments on three widely used datasets demonstrate the superiority of the proposed method compared to state-of-the-art baselines in multi-modal retrieval tasks. The code is available at https://github.com/LiuJinyu1229/DCGMH.

Paper Structure

This paper contains 30 sections, 18 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: BoxPlot comparison of average similarity scores for in-category and out-category across clean and noisy label datasets, where "Out-Category(0)" represents the box plot distribution of the average similarity scores of hash codes to all categories it does not belong, while "In-Category(1)" represents the box plot distribution for the belonging categories, and the horizontal line within each box indicates the median of all average similarity scores.
  • Figure 2: The architecture of our proposed DCGMH.
  • Figure 3: Precision@N curves at a noisy label ratio of 40% on the three benchmark datasets with the code length of 64 bits.
  • Figure 4: PR curves at a noisy label ratio of 40% on the three benchmark datasets with the code length of 64 bits.
  • Figure 5: MAPs at different noisy label ratios on the training set of three benchmark datasets with the code length of 64 bits.
  • ...and 1 more figures