Table of Contents
Fetching ...

Zero-Shot Hashing Based on Reconstruction With Part Alignment

Yan Jiang, Zhongmiao Qi, Jianhao Li, Jiangbo Qian, Chong Wang, Yu Xin

TL;DR

RAZH addresses zero-shot hashing by aligning image parts with semantic attributes rather than whole-image attributes, mitigating noise from coarse alignment. It introduces a four-module architecture with a dual-branch reconstruction strategy that clusters patches, replaces them with attribute vectors, and jointly optimizes hash, classification, and reconstruction losses. Empirical results on CIFAR10, CUB, and AWA2 show substantial improvements over state-of-the-art methods, with ablations confirming the critical role of part-level alignment and the proposed loss components. The method demonstrates strong generalization to unseen classes and offers a scalable, patch-centric approach to cross-class retrieval. Future work suggests model compression and training-time acceleration to enhance practicality.

Abstract

Hashing algorithms have been widely used in large-scale image retrieval tasks, especially for seen class data. Zero-shot hashing algorithms have been proposed to handle unseen class data. The key technique in these algorithms involves learning features from seen classes and transferring them to unseen classes, that is, aligning the feature embeddings between the seen and unseen classes. Most existing zero-shot hashing algorithms use the shared attributes between the two classes of interest to complete alignment tasks. However, the attributes are always described for a whole image, even though they represent specific parts of the image. Hence, these methods ignore the importance of aligning attributes with the corresponding image parts, which explicitly introduces noise and reduces the accuracy achieved when aligning the features of seen and unseen classes. To address this problem, we propose a new zero-shot hashing method called RAZH. We first use a clustering algorithm to group similar patches to image parts for attribute matching and then replace the image parts with the corresponding attribute vectors, gradually aligning each part with its nearest attribute. Extensive evaluation results demonstrate the superiority of the RAZH method over several state-of-the-art methods.

Zero-Shot Hashing Based on Reconstruction With Part Alignment

TL;DR

RAZH addresses zero-shot hashing by aligning image parts with semantic attributes rather than whole-image attributes, mitigating noise from coarse alignment. It introduces a four-module architecture with a dual-branch reconstruction strategy that clusters patches, replaces them with attribute vectors, and jointly optimizes hash, classification, and reconstruction losses. Empirical results on CIFAR10, CUB, and AWA2 show substantial improvements over state-of-the-art methods, with ablations confirming the critical role of part-level alignment and the proposed loss components. The method demonstrates strong generalization to unseen classes and offers a scalable, patch-centric approach to cross-class retrieval. Future work suggests model compression and training-time acceleration to enhance practicality.

Abstract

Hashing algorithms have been widely used in large-scale image retrieval tasks, especially for seen class data. Zero-shot hashing algorithms have been proposed to handle unseen class data. The key technique in these algorithms involves learning features from seen classes and transferring them to unseen classes, that is, aligning the feature embeddings between the seen and unseen classes. Most existing zero-shot hashing algorithms use the shared attributes between the two classes of interest to complete alignment tasks. However, the attributes are always described for a whole image, even though they represent specific parts of the image. Hence, these methods ignore the importance of aligning attributes with the corresponding image parts, which explicitly introduces noise and reduces the accuracy achieved when aligning the features of seen and unseen classes. To address this problem, we propose a new zero-shot hashing method called RAZH. We first use a clustering algorithm to group similar patches to image parts for attribute matching and then replace the image parts with the corresponding attribute vectors, gradually aligning each part with its nearest attribute. Extensive evaluation results demonstrate the superiority of the RAZH method over several state-of-the-art methods.

Paper Structure

This paper contains 6 sections, 22 equations, 11 figures, 8 tables, 1 algorithm.

Figures (11)

  • Figure 1: Framework of zero-shot hashing algorithm. The blue background represents the model training process, while the yellow background represents the retrieval process.
  • Figure 2: The architecture of RAZH, which is composed of four key modules: 1) an input module: inputs to the model for training; 2) a visual embedding module: aligns patches and attributes through reconstruction operations; 3) an attribute embedding module: replaces patches in attribute embedding; and 4) a hashing learning module: integrates overall losses to train the model. The idea behind RAZH is to replace unselected image patches with attribute patches that have identical semantic information, thereby fusing the image and attribute data for joint training.
  • Figure 3: In the three zero-shot hashing datasets, the mAP tends to increase as the hash code length increases.
  • Figure 4: Performance (PR Curve, P@N Curve and R@N Curve) at 64 bits hash codes on the AWA2 dataset.
  • Figure 5: In the three zero-shot hashing datasets, the confusion matrix of hash code distances between the seen classes and all classes.
  • ...and 6 more figures