Table of Contents
Fetching ...

Deep Class-guided Hashing for Multi-label Cross-modal Retrieval

Hao Chen, Lei Zhu, Xinghui Zhu

TL;DR

This paper uses proxy loss as the mainstay to maintain intra-class aggregation of data, combined with pairwise loss to maintain inter-class structural relationships, and proposes a variance constraint to address the semantic bias issue caused by the combination.

Abstract

Deep hashing, due to its low cost and efficient retrieval advantages, is widely valued in cross-modal retrieval. However, existing cross-modal hashing methods either explore the relationships between data points, which inevitably leads to intra-class dispersion, or explore the relationships between data points and categories while ignoring the preservation of inter-class structural relationships, resulting in the generation of suboptimal hash codes. How to maintain both intra-class aggregation and inter-class structural relationships, In response to this issue, this paper proposes a DCGH method. Specifically, we use proxy loss as the mainstay to maintain intra-class aggregation of data, combined with pairwise loss to maintain inter-class structural relationships, and on this basis, further propose a variance constraint to address the semantic bias issue caused by the combination. A large number of comparative experiments on three benchmark datasets show that the DCGH method has comparable or even better performance compared to existing cross-modal retrieval methods. The code for the implementation of our DCGH framework is available at https://github.com/donnotnormal/DCGH.

Deep Class-guided Hashing for Multi-label Cross-modal Retrieval

TL;DR

This paper uses proxy loss as the mainstay to maintain intra-class aggregation of data, combined with pairwise loss to maintain inter-class structural relationships, and proposes a variance constraint to address the semantic bias issue caused by the combination.

Abstract

Deep hashing, due to its low cost and efficient retrieval advantages, is widely valued in cross-modal retrieval. However, existing cross-modal hashing methods either explore the relationships between data points, which inevitably leads to intra-class dispersion, or explore the relationships between data points and categories while ignoring the preservation of inter-class structural relationships, resulting in the generation of suboptimal hash codes. How to maintain both intra-class aggregation and inter-class structural relationships, In response to this issue, this paper proposes a DCGH method. Specifically, we use proxy loss as the mainstay to maintain intra-class aggregation of data, combined with pairwise loss to maintain inter-class structural relationships, and on this basis, further propose a variance constraint to address the semantic bias issue caused by the combination. A large number of comparative experiments on three benchmark datasets show that the DCGH method has comparable or even better performance compared to existing cross-modal retrieval methods. The code for the implementation of our DCGH framework is available at https://github.com/donnotnormal/DCGH.

Paper Structure

This paper contains 24 sections, 18 equations, 13 figures, 6 tables, 2 algorithms.

Figures (13)

  • Figure 1: Illustration of hash codes learned using different hash losses. Class S1 and S2 have common labels, S2 and S3 have common labels, while S1 and S3 have no common labels. Compared with the pairwise loss, triplet loss, center loss and proxy loss, our approach maintains both intra-class aggregation and inter-class structural relationships.
  • Figure 2: An overview of our proposed DCGH framework, including two parts: (1) Feature Learning: Two feature extractors with different Transformer Encoders are designed to extract the representative semantic features from image modalities and text modalities respectively. (2) Hash Learning: Ingeniously combining proxy loss and pairwise loss, while exploring relationships between points and between points and classes, and to prevent semantic bias, variance constraints are introduced to ensure the consistency of the relationship between data and relevant proxy points.
  • Figure 3: Class S1 is related to proxy points P1, P2, and P3, while class S2 is related to proxy points P1 and P4. When considering both point and class proxies, as well as the relationships between points, if there are more data points related to P1 than to other proxies, the data points will be biased towards the P1 proxy. However, the data points should maintain a consistent distance relationship with each of their related proxies.
  • Figure 4: Some examples of image-text paris in MIRFLICKR-25K, NUS-WIDE and MS COCO.
  • Figure 5: Results of Precision-Recall curves on MIRFLICKR-25K w.r.t.16bits and 32bits.
  • ...and 8 more figures