Table of Contents
Fetching ...

RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval

Jianzong Wang, Haoxiang Shi, Kaiyi Luo, Xulong Zhang, Ning Cheng, Jing Xiao

TL;DR

RREH tackles semi-paired cross-modal retrieval with an unsupervised hashing framework that preserves high-order data relations via reconstruction factors learned from anchors. It combines reconstruction-factor learning, a shared latent subspace, and construction relations embedding into a unified discrete optimization to produce discriminative hash codes, while avoiding large Laplacian matrices. The method demonstrates strong accuracy and scalability on MIRFlickr and NUS-WIDE under semi-paired settings, outperforming several baselines and approaching fully supervised performance in some cases. This approach reduces the need for full cross-modal correspondence, enabling efficient large-scale cross-modal retrieval with robust feature learning and binary hashing.

Abstract

Known for efficient computation and easy storage, hashing has been extensively explored in cross-modal retrieval. The majority of current hashing models are predicated on the premise of a direct one-to-one mapping between data points. However, in real practice, data correspondence across modalities may be partially provided. In this research, we introduce an innovative unsupervised hashing technique designed for semi-paired cross-modal retrieval tasks, named Reconstruction Relations Embedded Hashing (RREH). RREH assumes that multi-modal data share a common subspace. For paired data, RREH explores the latent consistent information of heterogeneous modalities by seeking a shared representation. For unpaired data, to effectively capture the latent discriminative features, the high-order relationships between unpaired data and anchors are embedded into the latent subspace, which are computed by efficient linear reconstruction. The anchors are sampled from paired data, which improves the efficiency of hash learning. The RREH trains the underlying features and the binary encodings in a unified framework with high-order reconstruction relations preserved. With the well devised objective function and discrete optimization algorithm, RREH is designed to be scalable, making it suitable for large-scale datasets and facilitating efficient cross-modal retrieval. In the evaluation process, the proposed is tested with partially paired data to establish its superiority over several existing methods.

RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval

TL;DR

RREH tackles semi-paired cross-modal retrieval with an unsupervised hashing framework that preserves high-order data relations via reconstruction factors learned from anchors. It combines reconstruction-factor learning, a shared latent subspace, and construction relations embedding into a unified discrete optimization to produce discriminative hash codes, while avoiding large Laplacian matrices. The method demonstrates strong accuracy and scalability on MIRFlickr and NUS-WIDE under semi-paired settings, outperforming several baselines and approaching fully supervised performance in some cases. This approach reduces the need for full cross-modal correspondence, enabling efficient large-scale cross-modal retrieval with robust feature learning and binary hashing.

Abstract

Known for efficient computation and easy storage, hashing has been extensively explored in cross-modal retrieval. The majority of current hashing models are predicated on the premise of a direct one-to-one mapping between data points. However, in real practice, data correspondence across modalities may be partially provided. In this research, we introduce an innovative unsupervised hashing technique designed for semi-paired cross-modal retrieval tasks, named Reconstruction Relations Embedded Hashing (RREH). RREH assumes that multi-modal data share a common subspace. For paired data, RREH explores the latent consistent information of heterogeneous modalities by seeking a shared representation. For unpaired data, to effectively capture the latent discriminative features, the high-order relationships between unpaired data and anchors are embedded into the latent subspace, which are computed by efficient linear reconstruction. The anchors are sampled from paired data, which improves the efficiency of hash learning. The RREH trains the underlying features and the binary encodings in a unified framework with high-order reconstruction relations preserved. With the well devised objective function and discrete optimization algorithm, RREH is designed to be scalable, making it suitable for large-scale datasets and facilitating efficient cross-modal retrieval. In the evaluation process, the proposed is tested with partially paired data to establish its superiority over several existing methods.
Paper Structure (18 sections, 17 equations, 3 figures, 2 tables)

This paper contains 18 sections, 17 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The overall workflow of RREH.
  • Figure 2: The precision-recall curves of semi-paired models on MIRFlickr.
  • Figure 3: Parameter analysis of $\beta$ and $\theta$ on MIRFlickr.