Table of Contents
Fetching ...

Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval

Jinpeng Wang, Bin Chen, Qiang Zhang, Zaiqiao Meng, Shangsong Liang, Shu-Tao Xia

TL;DR

WSDHQ tackles the challenge of image retrieval with weakly tagged data by learning a deep hyperspherical quantizer that preserves semantic information from tags. It enhances weak supervision through a tag correlation graph and DBSCAN-driven tag merging, then maps image features onto a semantic hypersphere and trains a cosine-based semantics-preserving loss together with a cosine quantization loss. An alternating optimization scheme updates network parameters, codes, and codebooks, achieving state-of-the-art results on MIR-FLICKR25K and NUS-WIDE in the weakly supervised setting. The work enables scalable, label-light deep quantization for large-scale image retrieval, with practical impact for leveraging web and social media data. $B$-bit codes and cosine-based supervision on a semantic hypersphere underpin the method’s accuracy and efficiency.

Abstract

Deep quantization methods have shown high efficiency on large-scale image retrieval. However, current models heavily rely on ground-truth information, hindering the application of quantization in label-hungry scenarios. A more realistic demand is to learn from inexhaustible uploaded images that are associated with informal tags provided by amateur users. Though such sketchy tags do not obviously reveal the labels, they actually contain useful semantic information for supervising deep quantization. To this end, we propose Weakly-Supervised Deep Hyperspherical Quantization (WSDHQ), which is the first work to learn deep quantization from weakly tagged images. Specifically, 1) we use word embeddings to represent the tags and enhance their semantic information based on a tag correlation graph. 2) To better preserve semantic information in quantization codes and reduce quantization error, we jointly learn semantics-preserving embeddings and supervised quantizer on hypersphere by employing a well-designed fusion layer and tailor-made loss functions. Extensive experiments show that WSDHQ can achieve state-of-art performance on weakly-supervised compact coding. Code is available at https://github.com/gimpong/AAAI21-WSDHQ.

Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval

TL;DR

WSDHQ tackles the challenge of image retrieval with weakly tagged data by learning a deep hyperspherical quantizer that preserves semantic information from tags. It enhances weak supervision through a tag correlation graph and DBSCAN-driven tag merging, then maps image features onto a semantic hypersphere and trains a cosine-based semantics-preserving loss together with a cosine quantization loss. An alternating optimization scheme updates network parameters, codes, and codebooks, achieving state-of-the-art results on MIR-FLICKR25K and NUS-WIDE in the weakly supervised setting. The work enables scalable, label-light deep quantization for large-scale image retrieval, with practical impact for leveraging web and social media data. -bit codes and cosine-based supervision on a semantic hypersphere underpin the method’s accuracy and efficiency.

Abstract

Deep quantization methods have shown high efficiency on large-scale image retrieval. However, current models heavily rely on ground-truth information, hindering the application of quantization in label-hungry scenarios. A more realistic demand is to learn from inexhaustible uploaded images that are associated with informal tags provided by amateur users. Though such sketchy tags do not obviously reveal the labels, they actually contain useful semantic information for supervising deep quantization. To this end, we propose Weakly-Supervised Deep Hyperspherical Quantization (WSDHQ), which is the first work to learn deep quantization from weakly tagged images. Specifically, 1) we use word embeddings to represent the tags and enhance their semantic information based on a tag correlation graph. 2) To better preserve semantic information in quantization codes and reduce quantization error, we jointly learn semantics-preserving embeddings and supervised quantizer on hypersphere by employing a well-designed fusion layer and tailor-made loss functions. Extensive experiments show that WSDHQ can achieve state-of-art performance on weakly-supervised compact coding. Code is available at https://github.com/gimpong/AAAI21-WSDHQ.
Paper Structure (14 sections, 10 equations, 5 figures, 2 tables)

This paper contains 14 sections, 10 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: An example from NUS-WIDE dataset to illustrate the problem of weakly supervised quantization using tags.
  • Figure 2: The proposed Weakly Supervised Deep Hyperspherical Quantization (WSDHQ) consists of five main parts: 1) a standard CNN, 2) a word embedding model, 3) a correlation graph, 4) a transform layer and 5) a semantic hypersphere.
  • Figure 3: Precision-recall curves on the MIR-FLICKR25K and NUS-WIDE datasets with binary codes @ 32 bits.
  • Figure 4: Precision@top-N curves on the MIR-FLICKR25K and NUS-WIDE datasets with binary codes @ 32 bits.
  • Figure 5: The MAP results of WSDHQ @ 32 bits w.r.t.$\gamma$ and $\lambda$ on two datasets. The values on dotted lines are the MAP results of WDHT (i.e., the best baseline) @ 32 bits.