Towards Efficient Deep Hashing Retrieval: Condensing Your Data via Feature-Embedding Matching
Tao Feng, Jie Zhang, Huashan Liu, Zhijie Wang, Shengyuan Pang
TL;DR
The paper tackles the high training cost of Deep Hashing Retrieval (DHR) by proposing Information-intensive Feature-Embedding Matching (IEM), a dataset condensation method based on distribution matching in the feature-embedding space and enhanced by network and dataset augmentations. IEM explicitly targets preserving hashing performance, formulating a distribution-matching objective $L_{IEM}$ to align real and synthetic feature distributions while expanding synthetic diversity through augmentations. Empirical results on CIFAR-10 and ImageNet-subset show that IEM outperforms existing condensation methods in retrieval accuracy (mAP) for hashing codes of length $K$ and does so with superior condensation efficiency, converging faster than baselines like IDC and DM. Furthermore, IEM generalizes across multiple DHR backbones (e.g., DPN, OrthoHash, DHD), indicating broad applicability and practical impact for scalable hashing-based retrieval systems.
Abstract
Deep hashing retrieval has gained widespread use in big data retrieval due to its robust feature extraction and efficient hashing process. However, training advanced deep hashing models has become more expensive due to complex optimizations and large datasets. Coreset selection and Dataset Condensation lower overall training costs by reducing the volume of training data without significantly compromising model accuracy for classification task. In this paper, we explore the effect of mainstream dataset condensation methods for deep hashing retrieval and propose IEM (Information-intensive feature Embedding Matching), which is centered on distribution matching and incorporates model and data augmentation techniques to further enhance the feature of hashing space. Extensive experiments demonstrate the superior performance and efficiency of our approach.
