Table of Contents
Fetching ...

Towards Efficient Deep Hashing Retrieval: Condensing Your Data via Feature-Embedding Matching

Tao Feng, Jie Zhang, Huashan Liu, Zhijie Wang, Shengyuan Pang

TL;DR

The paper tackles the high training cost of Deep Hashing Retrieval (DHR) by proposing Information-intensive Feature-Embedding Matching (IEM), a dataset condensation method based on distribution matching in the feature-embedding space and enhanced by network and dataset augmentations. IEM explicitly targets preserving hashing performance, formulating a distribution-matching objective $L_{IEM}$ to align real and synthetic feature distributions while expanding synthetic diversity through augmentations. Empirical results on CIFAR-10 and ImageNet-subset show that IEM outperforms existing condensation methods in retrieval accuracy (mAP) for hashing codes of length $K$ and does so with superior condensation efficiency, converging faster than baselines like IDC and DM. Furthermore, IEM generalizes across multiple DHR backbones (e.g., DPN, OrthoHash, DHD), indicating broad applicability and practical impact for scalable hashing-based retrieval systems.

Abstract

Deep hashing retrieval has gained widespread use in big data retrieval due to its robust feature extraction and efficient hashing process. However, training advanced deep hashing models has become more expensive due to complex optimizations and large datasets. Coreset selection and Dataset Condensation lower overall training costs by reducing the volume of training data without significantly compromising model accuracy for classification task. In this paper, we explore the effect of mainstream dataset condensation methods for deep hashing retrieval and propose IEM (Information-intensive feature Embedding Matching), which is centered on distribution matching and incorporates model and data augmentation techniques to further enhance the feature of hashing space. Extensive experiments demonstrate the superior performance and efficiency of our approach.

Towards Efficient Deep Hashing Retrieval: Condensing Your Data via Feature-Embedding Matching

TL;DR

The paper tackles the high training cost of Deep Hashing Retrieval (DHR) by proposing Information-intensive Feature-Embedding Matching (IEM), a dataset condensation method based on distribution matching in the feature-embedding space and enhanced by network and dataset augmentations. IEM explicitly targets preserving hashing performance, formulating a distribution-matching objective to align real and synthetic feature distributions while expanding synthetic diversity through augmentations. Empirical results on CIFAR-10 and ImageNet-subset show that IEM outperforms existing condensation methods in retrieval accuracy (mAP) for hashing codes of length and does so with superior condensation efficiency, converging faster than baselines like IDC and DM. Furthermore, IEM generalizes across multiple DHR backbones (e.g., DPN, OrthoHash, DHD), indicating broad applicability and practical impact for scalable hashing-based retrieval systems.

Abstract

Deep hashing retrieval has gained widespread use in big data retrieval due to its robust feature extraction and efficient hashing process. However, training advanced deep hashing models has become more expensive due to complex optimizations and large datasets. Coreset selection and Dataset Condensation lower overall training costs by reducing the volume of training data without significantly compromising model accuracy for classification task. In this paper, we explore the effect of mainstream dataset condensation methods for deep hashing retrieval and propose IEM (Information-intensive feature Embedding Matching), which is centered on distribution matching and incorporates model and data augmentation techniques to further enhance the feature of hashing space. Extensive experiments demonstrate the superior performance and efficiency of our approach.
Paper Structure (12 sections, 6 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 12 sections, 6 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: (a) The classified performance of condensed data. (b) The retrieval performance of condensed data.
  • Figure 2: The illustration of our proposed method (IEM) for DHR. First, we introduce perturbations to the initialized model and enhance the initialized synthetic data. Subsequently, we updata the synthetic data by feature matching between the distributions of the original and synthetic data.
  • Figure 3: Comparison of performance across varying training times for condensation on CIFAR10 (a) and ImageNet20 (b). The performance metrics for achieving convergence and their associated times for CIFAR10 (c) and ImageNet20 (d). The diameters of the circles represent the time needed for convergence.
  • Figure 4: The generalization performance of IEM and IDC across DHR methods on CIFAR10 (a) and ImageNet10 (b).