Table of Contents
Fetching ...

Leveraging High-Resolution Features for Improved Deep Hashing-based Image Retrieval

Aymene Berriche, Mehdi Adjal Zakaria, Riyadh Baghdadi

TL;DR

This work proposes a novel methodology that utilizes High-Resolution Networks (HRNets) as the backbone for the deep hashing task, termed High-Resolution Hashing Network (HHNet), which demonstrates superior performance compared to existing methods across all tested benchmark datasets.

Abstract

Deep hashing techniques have emerged as the predominant approach for efficient image retrieval. Traditionally, these methods utilize pre-trained convolutional neural networks (CNNs) such as AlexNet and VGG-16 as feature extractors. However, the increasing complexity of datasets poses challenges for these backbone architectures in capturing meaningful features essential for effective image retrieval. In this study, we explore the efficacy of employing high-resolution features learned through state-of-the-art techniques for image retrieval tasks. Specifically, we propose a novel methodology that utilizes High-Resolution Networks (HRNets) as the backbone for the deep hashing task, termed High-Resolution Hashing Network (HHNet). Our approach demonstrates superior performance compared to existing methods across all tested benchmark datasets, including CIFAR-10, NUS-WIDE, MS COCO, and ImageNet. This performance improvement is more pronounced for complex datasets, which highlights the need to learn high-resolution features for intricate image retrieval tasks. Furthermore, we conduct a comprehensive analysis of different HRNet configurations and provide insights into the optimal architecture for the deep hashing task

Leveraging High-Resolution Features for Improved Deep Hashing-based Image Retrieval

TL;DR

This work proposes a novel methodology that utilizes High-Resolution Networks (HRNets) as the backbone for the deep hashing task, termed High-Resolution Hashing Network (HHNet), which demonstrates superior performance compared to existing methods across all tested benchmark datasets.

Abstract

Deep hashing techniques have emerged as the predominant approach for efficient image retrieval. Traditionally, these methods utilize pre-trained convolutional neural networks (CNNs) such as AlexNet and VGG-16 as feature extractors. However, the increasing complexity of datasets poses challenges for these backbone architectures in capturing meaningful features essential for effective image retrieval. In this study, we explore the efficacy of employing high-resolution features learned through state-of-the-art techniques for image retrieval tasks. Specifically, we propose a novel methodology that utilizes High-Resolution Networks (HRNets) as the backbone for the deep hashing task, termed High-Resolution Hashing Network (HHNet). Our approach demonstrates superior performance compared to existing methods across all tested benchmark datasets, including CIFAR-10, NUS-WIDE, MS COCO, and ImageNet. This performance improvement is more pronounced for complex datasets, which highlights the need to learn high-resolution features for intricate image retrieval tasks. Furthermore, we conduct a comprehensive analysis of different HRNet configurations and provide insights into the optimal architecture for the deep hashing task
Paper Structure (14 sections, 4 equations, 3 figures, 3 tables)

This paper contains 14 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Illustration depicting the hierarchical structure of a high-resolution network architecture. The network consists of four stages, with the first stage employing high-resolution convolutions, while subsequent stages repeat multi-resolution blocks to handle varying levels of resolution information. This hierarchical approach ensures the preservation and effective utilization of spatial details throughout the network's processing pipeline SunXLW19sun2019highresolution.
  • Figure 2: llustration of the augmented HRNet architecture added components sun2019highresolution. The four-resolution feature maps are first fed into a bottleneck, increasing the number of output channels to 128, 256, 512, and 1024, respectively. Subsequently, high-resolution representations undergo downsampling through a 2-strided 3x3 convolution, outputting 256 channels, which are then added to the representations of the second-high-resolution representations. This process iterates twice to achieve 1024 channels over the smallest resolution. Finally, a transformation from 1024 to 2048 channels is accomplished through a 1x1 convolution, followed by a global average pooling operation.
  • Figure 3: Figure illustrating the additional hashing layers integrated into the augmented HRNet architecture to generate image hashes from high-resolution features.