Table of Contents
Fetching ...

Deep Hashing via Householder Quantization

Lucas R. Schwengber, Lucas Resende, Paulo Orenstein, Roberto I. Oliveira

TL;DR

This work tackles the binarization bottleneck in deep hashing for image retrieval by decoupling similarity learning from quantization. It introduces Householder Hashing Quantization (H²Q), which first learns a continuous embedding with no quantization penalty ($\lambda=0$) and then finds an optimal orthogonal transform $U^\star \in O(k)$ to binarize via sign, preserving the similarity structure due to orthogonal invariance. The method relies on a Householder-based decomposition to efficiently parameterize the orthogonal group and optimize with SGD, resulting in a hyperparameter-free, fast quantization that can be layered on top of any existing deep hashing approach. Empirical results across CIFAR-10, NUS-WIDE, MS COCO, and ImageNet show state-of-the-art gains across multiple baselines and architectures, with robust improvements and favorable computational cost, underscoring the practical impact for scalable image retrieval.

Abstract

Hashing is at the heart of large-scale image similarity search, and recent methods have been substantially improved through deep learning techniques. Such algorithms typically learn continuous embeddings of the data. To avoid a subsequent costly binarization step, a common solution is to employ loss functions that combine a similarity learning term (to ensure similar images are grouped to nearby embeddings) and a quantization penalty term (to ensure that the embedding entries are close to binarized entries, e.g., -1 or 1). Still, the interaction between these two terms can make learning harder and the embeddings worse. We propose an alternative quantization strategy that decomposes the learning problem in two stages: first, perform similarity learning over the embedding space with no quantization; second, find an optimal orthogonal transformation of the embeddings so each coordinate of the embedding is close to its sign, and then quantize the transformed embedding through the sign function. In the second step, we parametrize orthogonal transformations using Householder matrices to efficiently leverage stochastic gradient descent. Since similarity measures are usually invariant under orthogonal transformations, this quantization strategy comes at no cost in terms of performance. The resulting algorithm is unsupervised, fast, hyperparameter-free and can be run on top of any existing deep hashing or metric learning algorithm. We provide extensive experimental results showing that this approach leads to state-of-the-art performance on widely used image datasets, and, unlike other quantization strategies, brings consistent improvements in performance to existing deep hashing algorithms.

Deep Hashing via Householder Quantization

TL;DR

This work tackles the binarization bottleneck in deep hashing for image retrieval by decoupling similarity learning from quantization. It introduces Householder Hashing Quantization (H²Q), which first learns a continuous embedding with no quantization penalty () and then finds an optimal orthogonal transform to binarize via sign, preserving the similarity structure due to orthogonal invariance. The method relies on a Householder-based decomposition to efficiently parameterize the orthogonal group and optimize with SGD, resulting in a hyperparameter-free, fast quantization that can be layered on top of any existing deep hashing approach. Empirical results across CIFAR-10, NUS-WIDE, MS COCO, and ImageNet show state-of-the-art gains across multiple baselines and architectures, with robust improvements and favorable computational cost, underscoring the practical impact for scalable image retrieval.

Abstract

Hashing is at the heart of large-scale image similarity search, and recent methods have been substantially improved through deep learning techniques. Such algorithms typically learn continuous embeddings of the data. To avoid a subsequent costly binarization step, a common solution is to employ loss functions that combine a similarity learning term (to ensure similar images are grouped to nearby embeddings) and a quantization penalty term (to ensure that the embedding entries are close to binarized entries, e.g., -1 or 1). Still, the interaction between these two terms can make learning harder and the embeddings worse. We propose an alternative quantization strategy that decomposes the learning problem in two stages: first, perform similarity learning over the embedding space with no quantization; second, find an optimal orthogonal transformation of the embeddings so each coordinate of the embedding is close to its sign, and then quantize the transformed embedding through the sign function. In the second step, we parametrize orthogonal transformations using Householder matrices to efficiently leverage stochastic gradient descent. Since similarity measures are usually invariant under orthogonal transformations, this quantization strategy comes at no cost in terms of performance. The resulting algorithm is unsupervised, fast, hyperparameter-free and can be run on top of any existing deep hashing or metric learning algorithm. We provide extensive experimental results showing that this approach leads to state-of-the-art performance on widely used image datasets, and, unlike other quantization strategies, brings consistent improvements in performance to existing deep hashing algorithms.
Paper Structure (38 sections, 2 theorems, 46 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 38 sections, 2 theorems, 46 equations, 5 figures, 7 tables, 1 algorithm.

Key Result

Theorem 3.1

A map $U: \mathbb{R}^k \mapsto \mathbb{R}^k$ preserves inner products if and only if it is a linear orthogonal transformation.

Figures (5)

  • Figure 1: Deep hashing methods usually train low-dimensional embedding maps and obtain hashes by taking the sign of the embeddings coordinate-wise. To avoid a lossy discretization, a two-term loss $L = L_S + \lambda L_Q$ is used, where $L_S$ is a similarity term and $L_Q$ is a quantization term. In contrast, we first let $\lambda = 0$, obtaining good similarity-preserving embeddings and then train an orthogonal transformation $U_\theta$ parametrized by $\theta \in \Theta$ to binarize the embedding via the coordinate-wise sign. As similarity losses are typically invariant under $U_{\theta}$, the term $L_S$ remains unchanged as an optimal discretization is found through the choice of $\theta$.
  • Figure 2: Variation of mAP@k in percentage points (p.p.) of each quantization strategy relative to no quantization ($\lambda = 0$) on VGG-16 with $k=16$ bits. Unlike ITQ, HWSD and quantization penalty terms ($\lambda > 0$), H²Q always increases the performance metric.
  • Figure 3: Increase in mAP@k in percentage points (p.p.) after using H²Q as a quantization strategy with different underlying losses, relative to no quantization ($\lambda=0$). On VGG-16 with $k=16$ bits. The increase is consistently positive and generally uniform across loss functions.
  • Figure 4: Training and prediction times for different number of bits. We vary the size of the training fold in $\{2\times 10^3, 2\times 10^4\}$ and predict $\{10^5, 10^6\}$ hashes, respectively. Training time is at most 3 minutes and prediction time is less than a second.
  • Figure 5: Example of a case where the unsupervised losses could decrease the $\texttt{mAP}$.

Theorems & Definitions (2)

  • Theorem 3.1
  • Theorem 3.2