HiHPQ: Hierarchical Hyperbolic Product Quantization for Unsupervised Image Retrieval
Zexuan Qiu, Jiahong Liu, Yankai Chen, Irwin King
TL;DR
This work tackles unsupervised image retrieval by addressing two gaps: preserving multi-level semantic similarities and leveraging non-Euclidean geometry. It introduces HiHPQ, a Hierarchical Hyperbolic Product Quantization framework that embeds data in a Cartesian product of Lorentzian manifolds, uses a differentiable soft hyperbolic codebook, and optimizes with a hyperbolic contrastive objective. A hierarchical semantics learning module extracts pseudo hierarchies via clustering in tangent spaces and enforces both prototype-wise and instance-wise supervision on the hyperbolic embeddings. Empirical results on Flickr25K, NUS-WIDE, and CIFAR-10 demonstrate substantial gains over strong baselines, validating the benefits of combining hyperbolic geometry with hierarchical supervision for unsupervised product quantization.
Abstract
Existing unsupervised deep product quantization methods primarily aim for the increased similarity between different views of the identical image, whereas the delicate multi-level semantic similarities preserved between images are overlooked. Moreover, these methods predominantly focus on the Euclidean space for computational convenience, compromising their ability to map the multi-level semantic relationships between images effectively. To mitigate these shortcomings, we propose a novel unsupervised product quantization method dubbed \textbf{Hi}erarchical \textbf{H}yperbolic \textbf{P}roduct \textbf{Q}uantization (HiHPQ), which learns quantized representations by incorporating hierarchical semantic similarity within hyperbolic geometry. Specifically, we propose a hyperbolic product quantizer, where the hyperbolic codebook attention mechanism and the quantized contrastive learning on the hyperbolic product manifold are introduced to expedite quantization. Furthermore, we propose a hierarchical semantics learning module, designed to enhance the distinction between similar and non-matching images for a query by utilizing the extracted hierarchical semantics as an additional training supervision. Experiments on benchmarks show that our proposed method outperforms state-of-the-art baselines.
