Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval

Jing Yang; Hui Xue; Shipeng Zhu; Pengfei Fang

Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval

Jing Yang, Hui Xue, Shipeng Zhu, Pengfei Fang

Abstract

This paper studies unsupervised cross-domain image retrieval (UCDIR), which aims to retrieve images of the same category across different domains without relying on labeled data. Existing methods typically utilize pseudo-labels, derived from clustering algorithms, as supervisory signals for intra-domain representation learning and cross-domain feature alignment. However, these discrete pseudo-labels often fail to provide accurate and comprehensive semantic guidance. Moreover, the alignment process frequently overlooks the entanglement between domain-specific and semantic information, leading to semantic degradation in the learned representations and ultimately impairing retrieval performance. This paper addresses the limitations by proposing a Text-Phase Synergy Network with Dual Priors(TPSNet). Specifically, we first employ CLIP to generate a set of class-specific prompts per domain, termed as domain prompt, serving as a text prior that offers more precise semantic supervision. In parallel, we further introduce a phase prior, represented by domain-invariant phase features, which is integrated into the original image representations to bridge the domain distribution gaps while preserving semantic integrity. Leveraging the synergy of these dual priors, TPSNet significantly outperforms state-of-the-art methods on UCDIR benchmarks.

Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval

Abstract

Paper Structure (40 sections, 17 equations, 10 figures, 13 tables)

This paper contains 40 sections, 17 equations, 10 figures, 13 tables.

Introduction
Related Work
Cross-Domain Image Retrieval.
Unsupervised Domain Alignment Methods.
Method
Overview
Domain Prompt Generation Module
Text-Phase Dual Priors Network
Text-Prior Semantic Feature Extraction.
Phase-Prior Domain-Invariant Feature Extraction.
Phase Feature Encoder.
Synergy of Text-Phase Dual Priors.
Experiments
Datasets and Setting
Datasets.
...and 25 more sections

Figures (10)

Figure 1: Comparison between (a) existing methods and (b) our proposed TPSNet. Existing methods rely on inaccurate pseudo-labels for intra-domain and cross-domain learning, often causing semantic loss. In contrast, TPSNet leverages text and phase dual priors to extract domain-invariant semantic features.
Figure 2: The pipeline of TPSNet. Left: domain prompt generation via the prompt learning paradigm. Top-right: text-phase dual prior construction with contrastive learning for unsupervised cross-domain image retrieval. Bottom: the detailed architecture of the proposed phase feature encoder.
Figure 3: Average Accuracy (%) of UCDIR Methods using (a) ResNet-50 and (b) ViT-B as image encoders.
Figure 4: t-SNE visualizations of last-layer features for the baseline model (v1), baseline with text prior (v2), and TPSNet (v3) across two scenarios from two datasets.
Figure 5: Grad-CAM visualizations of last-layer features for the baseline model (v1), baseline with text prior (v2), and TPSNet (v3) on randomly selected samples.
...and 5 more figures

Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval

Abstract

Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval

Authors

Abstract

Table of Contents

Figures (10)