Table of Contents
Fetching ...

High-level Codes and Fine-grained Weights for Online Multi-modal Hashing Retrieval

Yu-Wei Zhan, Xiao-Ming Wu, Xin Luo, Yinwei Wei, Xin-Shun Xu

TL;DR

This work addresses online multi-modal hashing under streaming and category-incremental conditions by introducing High-level Codes, Fine-grained Weights (HCFW). The approach separates hash-code learning (via category-level high-level codes derived from semantic embeddings) from hash-function learning (with per-modality linear projections) and augments fusion with per-instance weights to exploit complementary modalities. Key contributions include (i) a semantic-driven high-level code generation mechanism that maintains long-term code consistency across rounds, and (ii) a fine-grained weighting scheme that enables instance-level modality fusion and improves retrieval accuracy. Empirical results on MIRFlickr and NUS-WIDE demonstrate that HCFW achieves state-of-the-art MAP under standard online and category-incremental settings, while maintaining favorable convergence and runtime characteristics; the authors also provide ablations and analysis to validate the components and offer code upon release.

Abstract

In the real world, multi-modal data often appears in a streaming fashion, and there is a growing demand for similarity retrieval from such non-stationary data, especially at a large scale. In response to this need, online multi-modal hashing has gained significant attention. However, existing online multi-modal hashing methods face challenges related to the inconsistency of hash codes during long-term learning and inefficient fusion of different modalities. In this paper, we present a novel approach to supervised online multi-modal hashing, called High-level Codes, Fine-grained Weights (HCFW). To address these problems, HCFW is designed by its non-trivial contributions from two primary dimensions: 1) Online Hashing Perspective. To ensure the long-term consistency of hash codes, especially in incremental learning scenarios, HCFW learns high-level codes derived from category-level semantics. Besides, these codes are adept at handling the category-incremental challenge. 2) Multi-modal Hashing Aspect. HCFW introduces the concept of fine-grained weights designed to facilitate the seamless fusion of complementary multi-modal data, thereby generating multi-modal weights at the instance level and enhancing the overall hashing performance. A comprehensive battery of experiments conducted on two benchmark datasets convincingly underscores the effectiveness and efficiency of HCFW.

High-level Codes and Fine-grained Weights for Online Multi-modal Hashing Retrieval

TL;DR

This work addresses online multi-modal hashing under streaming and category-incremental conditions by introducing High-level Codes, Fine-grained Weights (HCFW). The approach separates hash-code learning (via category-level high-level codes derived from semantic embeddings) from hash-function learning (with per-modality linear projections) and augments fusion with per-instance weights to exploit complementary modalities. Key contributions include (i) a semantic-driven high-level code generation mechanism that maintains long-term code consistency across rounds, and (ii) a fine-grained weighting scheme that enables instance-level modality fusion and improves retrieval accuracy. Empirical results on MIRFlickr and NUS-WIDE demonstrate that HCFW achieves state-of-the-art MAP under standard online and category-incremental settings, while maintaining favorable convergence and runtime characteristics; the authors also provide ablations and analysis to validate the components and offer code upon release.

Abstract

In the real world, multi-modal data often appears in a streaming fashion, and there is a growing demand for similarity retrieval from such non-stationary data, especially at a large scale. In response to this need, online multi-modal hashing has gained significant attention. However, existing online multi-modal hashing methods face challenges related to the inconsistency of hash codes during long-term learning and inefficient fusion of different modalities. In this paper, we present a novel approach to supervised online multi-modal hashing, called High-level Codes, Fine-grained Weights (HCFW). To address these problems, HCFW is designed by its non-trivial contributions from two primary dimensions: 1) Online Hashing Perspective. To ensure the long-term consistency of hash codes, especially in incremental learning scenarios, HCFW learns high-level codes derived from category-level semantics. Besides, these codes are adept at handling the category-incremental challenge. 2) Multi-modal Hashing Aspect. HCFW introduces the concept of fine-grained weights designed to facilitate the seamless fusion of complementary multi-modal data, thereby generating multi-modal weights at the instance level and enhancing the overall hashing performance. A comprehensive battery of experiments conducted on two benchmark datasets convincingly underscores the effectiveness and efficiency of HCFW.
Paper Structure (34 sections, 15 equations, 4 figures, 9 tables)

This paper contains 34 sections, 15 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: The framework of the proposed HCFW. Without loss of generality, the training procedure of HCFW at the t-th round is illustrated. It contains two parts, i.e., high-level code generation and fine-grained weights.
  • Figure 2: MAP results versus rounds in standard online scenario on NUSWIDE.
  • Figure 3: Parameter sensitivity results on MIRFlickr.
  • Figure 4: Convergence analysis on MIRFlickr.