Distribution-Consistency-Guided Multi-modal Hashing
Jin-Yu Liu, Xian-Ling Mao, Tian-Yi Che, Rong-Cheng Tu
TL;DR
This work addresses robust multi-modal retrieval when training labels are noisy. It introduces DCGMH, which leverages a distribution-consistency pattern between the 1-0 label distribution and hash-code similarity to category centers to filter and reconstruct noisy labels. The method partitions data into clean, corrected, and unlabeled subsets, applying a multi-term objective that combines pointwise, pairwise, unsupervised, center, and quantization losses to learn high-quality hash codes. Extensive experiments on MIR Flickr, NUS-WIDE, and MS COCO show DCGMH consistently outperforms state-of-the-art baselines under noisy-label settings, validating its robustness and practical impact for real-world retrieval tasks.
Abstract
Multi-modal hashing methods have gained popularity due to their fast speed and low storage requirements. Among them, the supervised methods demonstrate better performance by utilizing labels as supervisory signals compared with unsupervised methods. Currently, for almost all supervised multi-modal hashing methods, there is a hidden assumption that training sets have no noisy labels. However, labels are often annotated incorrectly due to manual labeling in real-world scenarios, which will greatly harm the retrieval performance. To address this issue, we first discover a significant distribution consistency pattern through experiments, i.e., the 1-0 distribution of the presence or absence of each category in the label is consistent with the high-low distribution of similarity scores of the hash codes relative to category centers. Then, inspired by this pattern, we propose a novel Distribution-Consistency-Guided Multi-modal Hashing (DCGMH), which aims to filter and reconstruct noisy labels to enhance retrieval performance. Specifically, the proposed method first randomly initializes several category centers, which are used to compute the high-low distribution of similarity scores; Noisy and clean labels are then separately filtered out via the discovered distribution consistency pattern to mitigate the impact of noisy labels; Subsequently, a correction strategy, which is indirectly designed via the distribution consistency pattern, is applied to the filtered noisy labels, correcting high-confidence ones while treating low-confidence ones as unlabeled for unsupervised learning, thereby further enhancing the model's performance. Extensive experiments on three widely used datasets demonstrate the superiority of the proposed method compared to state-of-the-art baselines in multi-modal retrieval tasks. The code is available at https://github.com/LiuJinyu1229/DCGMH.
