PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval
Qiang Zou, Shuli Cheng, Jiayi Chen
TL;DR
PromptHash tackles semantic truncation and modal heterogeneity in cross-modal hashing by introducing affinity-prompted learning, an adaptive gated state-space fusion, and a Prompt Affinity Contrastive Learning (PACL) framework. The method uses text affinity prompts to preserve foreground semantics under CLIP's context limits, fuses image and prompt-rich text via a gated State Space Model, and aligns modalities with global-local prompt contrast and affinity-aware losses. Empirical results on MIRFLICKR-25K, NUS-WIDE, and MS COCO show substantial improvements over state-of-the-art methods, including an ${18.22}\%$ (I2T) and ${18.65}\%$ (T2I) gain on NUS-WIDE, and strong gains on the other datasets. The work introduces a new paradigm for cross-modal hashing that emphasizes semantic consistency, efficient fusion, and foreground-background discrimination, with publicly available code to enable reproducibility.
Abstract
Cross-modal hashing is a promising approach for efficient data retrieval and storage optimization. However, contemporary methods exhibit significant limitations in semantic preservation, contextual integrity, and information redundancy, which constrains retrieval efficacy. We present PromptHash, an innovative framework leveraging affinity prompt-aware collaborative learning for adaptive cross-modal hashing. We propose an end-to-end framework for affinity-prompted collaborative hashing, with the following fundamental technical contributions: (i) a text affinity prompt learning mechanism that preserves contextual information while maintaining parameter efficiency, (ii) an adaptive gated selection fusion architecture that synthesizes State Space Model with Transformer network for precise cross-modal feature integration, and (iii) a prompt affinity alignment strategy that bridges modal heterogeneity through hierarchical contrastive learning. To the best of our knowledge, this study presents the first investigation into affinity prompt awareness within collaborative cross-modal adaptive hash learning, establishing a paradigm for enhanced semantic consistency across modalities. Through comprehensive evaluation on three benchmark multi-label datasets, PromptHash demonstrates substantial performance improvements over existing approaches. Notably, on the NUS-WIDE dataset, our method achieves significant gains of 18.22% and 18.65% in image-to-text and text-to-image retrieval tasks, respectively. The code is publicly available at https://github.com/ShiShuMo/PromptHash.
