Table of Contents
Fetching ...

Codebook-Centric Deep Hashing: End-to-End Joint Learning of Semantic Hash Centers and Neural Hash Function

Shuo Yin, Zhiyuan Yin, Yuqing Hou, Rui Liu, Yong Chen, Dell Zhang

TL;DR

This work tackles semantic deep hashing by removing the dependency on fixed, randomly assigned class centers. It introduces Center-Reassigned Hashing (CRH), an end-to-end framework that jointly learns a neural hash function and class hash centers by dynamically reassigning centers from a codebook of $M$ binary vectors in $ackslash{-1,1ackslash}^K$, with a multi-head extension of size $H$ to enhance semantic expressiveness. Centers are updated through a center reassignment step that minimizes the discrepancy between current hash codes and candidate centers, using either the Hungarian algorithm or greedy assignment, while the hash function is trained with a margin-based cross-entropy loss plus a quantization term. Extensive experiments on Stanford Cars, NABirds, and MS COCO demonstrate state-of-the-art retrieval performance and reveal that the multi-head codebook and dynamic center reassignment yield semantically meaningful centers, as evidenced by CLIP-based semantic alignment (PCC). The approach is efficient, scalable to large class counts, and adaptable to other modalities, offering a practical advancement for semantic hashing in large-scale retrieval systems.

Abstract

Hash center-based deep hashing methods improve upon pairwise or triplet-based approaches by assigning fixed hash centers to each class as learning targets, thereby avoiding the inefficiency of local similarity optimization. However, random center initialization often disregards inter-class semantic relationships. While existing two-stage methods mitigate this by first refining hash centers with semantics and then training the hash function, they introduce additional complexity, computational overhead, and suboptimal performance due to stage-wise discrepancies. To address these limitations, we propose $\textbf{Center-Reassigned Hashing (CRH)}$, an end-to-end framework that $\textbf{dynamically reassigns hash centers}$ from a preset codebook while jointly optimizing the hash function. Unlike previous methods, CRH adapts hash centers to the data distribution $\textbf{without explicit center optimization phases}$, enabling seamless integration of semantic relationships into the learning process. Furthermore, $\textbf{a multi-head mechanism}$ enhances the representational capacity of hash centers, capturing richer semantic structures. Extensive experiments on three benchmarks demonstrate that CRH learns semantically meaningful hash centers and outperforms state-of-the-art deep hashing methods in retrieval tasks.

Codebook-Centric Deep Hashing: End-to-End Joint Learning of Semantic Hash Centers and Neural Hash Function

TL;DR

This work tackles semantic deep hashing by removing the dependency on fixed, randomly assigned class centers. It introduces Center-Reassigned Hashing (CRH), an end-to-end framework that jointly learns a neural hash function and class hash centers by dynamically reassigning centers from a codebook of binary vectors in , with a multi-head extension of size to enhance semantic expressiveness. Centers are updated through a center reassignment step that minimizes the discrepancy between current hash codes and candidate centers, using either the Hungarian algorithm or greedy assignment, while the hash function is trained with a margin-based cross-entropy loss plus a quantization term. Extensive experiments on Stanford Cars, NABirds, and MS COCO demonstrate state-of-the-art retrieval performance and reveal that the multi-head codebook and dynamic center reassignment yield semantically meaningful centers, as evidenced by CLIP-based semantic alignment (PCC). The approach is efficient, scalable to large class counts, and adaptable to other modalities, offering a practical advancement for semantic hashing in large-scale retrieval systems.

Abstract

Hash center-based deep hashing methods improve upon pairwise or triplet-based approaches by assigning fixed hash centers to each class as learning targets, thereby avoiding the inefficiency of local similarity optimization. However, random center initialization often disregards inter-class semantic relationships. While existing two-stage methods mitigate this by first refining hash centers with semantics and then training the hash function, they introduce additional complexity, computational overhead, and suboptimal performance due to stage-wise discrepancies. To address these limitations, we propose , an end-to-end framework that from a preset codebook while jointly optimizing the hash function. Unlike previous methods, CRH adapts hash centers to the data distribution , enabling seamless integration of semantic relationships into the learning process. Furthermore, enhances the representational capacity of hash centers, capturing richer semantic structures. Extensive experiments on three benchmarks demonstrate that CRH learns semantically meaningful hash centers and outperforms state-of-the-art deep hashing methods in retrieval tasks.

Paper Structure

This paper contains 38 sections, 9 equations, 6 figures, 12 tables, 1 algorithm.

Figures (6)

  • Figure 1: The overall framework of CRH. Top: Hamming space visualization of the iterative hash center reassignment across 3 stages: (1) initial or previous assignment, (2) hash code convergence, and (3) updated center assignment. Three colors represent three classes. Bottom: hash function training and multi-head update process for class $c$, where each head independently updates its sub-center $\mathbf{z}_m^h$ on a hash split, followed by concatenation into the full center $\mathbf{c}_c$.
  • Figure 2: mAP vs. PCC (64-bit) on different datasets. Arrows link baseline variants ("original" → "updated", e.g., MDS → MDS$_U$) and CRH-U → CRH-M → CRH. Regression lines with 95% confidence intervals indicate the linear trend.
  • Figure 3: Impact of hash center update frequency on mAP/PCC across datasets. "5 (20)" indicates updates every epoch for the first 20 epochs, then every 5; "interval=2" denotes updates every 2 epochs (similarly for other values); "$\infty$" means no updates.
  • Figure 4: mAP w.r.t. codebook size $M$ (left) and head dimension $d$ (right) on three benchmark datasets.
  • Figure 5: mAP scores (%) vs. the hyperparameter $\lambda$ values under 32/64-bit configurations across three datasets.
  • ...and 1 more figures