Table of Contents
Fetching ...

CoopHash: Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing

Khoa D. Doan, Jianwen Xie, Yaxuan Zhu, Yang Zhao, Ping Li

TL;DR

CoopHash tackles limited labeled data in supervised image hashing by integrating energy-based cooperative learning with a variational MCMC teaching framework. It jointly learns a contrastive pair generator and a multipurpose descriptor with four heads (density, inference, hashing, and discrimination) to generate informative contrastive samples and produce robust binary hash codes. Training combines maximum-likelihood objectives for the descriptor, a variational autoencoder style objective for the generator, and a triplet-based hashing loss, guided by short-run MCMC steps. Experiments on NUS-WIDE, COCO, and CIFAR-10 demonstrate state-of-the-art retrieval performance and strong robustness to out-of-distribution and corrupted data, outperforming GAN-based methods and prior hashing approaches.

Abstract

Leveraging supervised information can lead to superior retrieval performance in the image hashing domain but the performance degrades significantly without enough labeled data. One effective solution to boost performance is to employ generative models, such as Generative Adversarial Networks (GANs), to generate synthetic data in an image hashing model. However, GAN-based methods are difficult to train, which prevents the hashing approaches from jointly training the generative models and the hash functions. This limitation results in sub-optimal retrieval performance. To overcome this limitation, we propose a novel framework, the generative cooperative hashing network, which is based on energy-based cooperative learning. This framework jointly learns a powerful generative representation of the data and a robust hash function via two components: a top-down contrastive pair generator that synthesizes contrastive images and a bottom-up multipurpose descriptor that simultaneously represents the images from multiple perspectives, including probability density, hash code, latent code, and category. The two components are jointly learned via a novel likelihood-based cooperative learning scheme. We conduct experiments on several real-world datasets and show that the proposed method outperforms the competing hashing supervised methods, achieving up to 10\% relative improvement over the current state-of-the-art supervised hashing methods, and exhibits a significantly better performance in out-of-distribution retrieval.

CoopHash: Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing

TL;DR

CoopHash tackles limited labeled data in supervised image hashing by integrating energy-based cooperative learning with a variational MCMC teaching framework. It jointly learns a contrastive pair generator and a multipurpose descriptor with four heads (density, inference, hashing, and discrimination) to generate informative contrastive samples and produce robust binary hash codes. Training combines maximum-likelihood objectives for the descriptor, a variational autoencoder style objective for the generator, and a triplet-based hashing loss, guided by short-run MCMC steps. Experiments on NUS-WIDE, COCO, and CIFAR-10 demonstrate state-of-the-art retrieval performance and strong robustness to out-of-distribution and corrupted data, outperforming GAN-based methods and prior hashing approaches.

Abstract

Leveraging supervised information can lead to superior retrieval performance in the image hashing domain but the performance degrades significantly without enough labeled data. One effective solution to boost performance is to employ generative models, such as Generative Adversarial Networks (GANs), to generate synthetic data in an image hashing model. However, GAN-based methods are difficult to train, which prevents the hashing approaches from jointly training the generative models and the hash functions. This limitation results in sub-optimal retrieval performance. To overcome this limitation, we propose a novel framework, the generative cooperative hashing network, which is based on energy-based cooperative learning. This framework jointly learns a powerful generative representation of the data and a robust hash function via two components: a top-down contrastive pair generator that synthesizes contrastive images and a bottom-up multipurpose descriptor that simultaneously represents the images from multiple perspectives, including probability density, hash code, latent code, and category. The two components are jointly learned via a novel likelihood-based cooperative learning scheme. We conduct experiments on several real-world datasets and show that the proposed method outperforms the competing hashing supervised methods, achieving up to 10\% relative improvement over the current state-of-the-art supervised hashing methods, and exhibits a significantly better performance in out-of-distribution retrieval.
Paper Structure (39 sections, 20 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 39 sections, 20 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: CoopHash consists of two main components: 1) a contrastive pair generator, that takes as inputs the concatenation of a random noise vector $z$ and a label $c^+$ and synthesizes a contrastive image pair $\{\hat{x}^+,\hat{x}^-\}$s from the same class $c^+$ and a different class $c^-$. 2) a multipurpose descriptor that describes the images in multiple ways, including an explicit density model $p(x|c)$, an variational inference model $p(z|x)$, a discriminative model $p(c|x)$, and a hashing model. All four models share a base bottom-up representational network. The multipurpose descriptor network is trained by a loss including negative maximum likelihood, variational loss, triplet-ranking loss, and classification loss, while the contrastive pair generator learns from the descriptor and serves as a fast initializer of the MCMC of the descriptor. In the retrieval phase, only the hashing computational path is used; the binary hash codes are the signs of the real-valued outputs.
  • Figure 2: mAP performance of OOD Re- trieval experiments.
  • Figure 3: Ablation Study. $-\mathcal{L}_{\text{CLASS}}$: CoopHash without Discriminative head. $-\mathcal{L}_{\text{TR}}$: CoopHash without Hash head. $-\mathcal{L}_{\text{NLL}}$: CoopHash without Energy head. $-\mathcal{L}_{\text{VAE}}$: CoopHash without Inference head. ALL: CoopHash.
  • Figure 4: mAP, FID, and Classification Accuracy of different heads during model training on MNIST and CIFAR10 datasets. In each task, Baseline is a model with similar architecture that is only trained to perform a specific task (i.e. hashing or classification). GAN-based: a similar GAN-based model to CoopHash.
  • Figure 5: The t-SNE visualizations of the quantized 32-bit hash codes learned by HashNet, HashGAN and CoopHash on the CIFAR-10 dataset.
  • ...and 4 more figures