ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval
Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen
TL;DR
This work tackles Universal Cross-Domain Retrieval (UCDR) by introducing ProS, a prompt-tuning framework that simulates generalized knowledge through Content-aware Dynamic Prompts (CaDP). ProS comprises two stages: Prompt Units Learning, which builds domain and semantic prompt units using a mask-and-align strategy, and Context-aware Simulator Learning, which trains a CaPS to generate CaDP under simulated test conditions. CaDP then steers the CLIP image encoder to produce more generalizable embeddings for unseen domains and categories, enabling robust retrieval. Extensive experiments on DomainNet, Sketchy, and TU-Berlin demonstrate state-of-the-art performance with a modest parameter budget, and ablation studies confirm the importance of each component. The approach offers a practical, scalable path to open-set cross-domain search with publicly available code.
Abstract
The goal of Universal Cross-Domain Retrieval (UCDR) is to achieve robust performance in generalized test scenarios, wherein data may belong to strictly unknown domains and categories during training. Recently, pre-trained models with prompt tuning have shown strong generalization capabilities and attained noteworthy achievements in various downstream tasks, such as few-shot learning and video-text retrieval. However, applying them directly to UCDR may not sufficiently to handle both domain shift (i.e., adapting to unfamiliar domains) and semantic shift (i.e., transferring to unknown categories). To this end, we propose \textbf{Pro}mpting-to-\textbf{S}imulate (ProS), the first method to apply prompt tuning for UCDR. ProS employs a two-step process to simulate Content-aware Dynamic Prompts (CaDP) which can impact models to produce generalized features for UCDR. Concretely, in Prompt Units Learning stage, we introduce two Prompt Units to individually capture domain and semantic knowledge in a mask-and-align way. Then, in Context-aware Simulator Learning stage, we train a Content-aware Prompt Simulator under a simulated test scenarios to produce the corresponding CaDP. Extensive experiments conducted on three benchmark datasets show that our method achieves new state-of-the-art performance without bringing excessive parameters. Our method is publicly available at https://github.com/fangkaipeng/ProS.
