Table of Contents
Fetching ...

Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport

Bin Li, Ye Shi, Qian Yu, Jingya Wang

TL;DR

Unsupervised Cross-Domain Image Retrieval (UCIR) aims to retrieve images of the same category across different domains without labels, and prior work typically treated intra-domain representation learning and cross-domain alignment separately. This paper introduces ProtoOT, a unified Prototypical Optimal Transport framework that couples intra-domain clustering with cross-domain alignment by incorporating K-means-derived prototypes as class marginals in OT and employing contrastive learning to foster both local semantic consistency and global discriminativeness. The method uses a memory-bank-based encoder with momentum updates, initializes prototypes via K-means, and optimizes a combined loss that addresses distribution imbalance and cross-domain supervision without labels. Across DomainNet and Office-Home, ProtoOT achieves substantial improvements over state-of-the-art methods, including an average P@200 gain of 18.17% on DomainNet and an average P@15 gain of 3.83% on Office-Home, demonstrating the practical impact of unifying representation learning and alignment under OT for UCIR.

Abstract

Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images sharing the same category across diverse domains without relying on labeled data. Prior approaches have typically decomposed the UCIR problem into two distinct tasks: intra-domain representation learning and cross-domain feature alignment. However, these segregated strategies overlook the potential synergies between these tasks. This paper introduces ProtoOT, a novel Optimal Transport formulation explicitly tailored for UCIR, which integrates intra-domain feature representation learning and cross-domain alignment into a unified framework. ProtoOT leverages the strengths of the K-means clustering method to effectively manage distribution imbalances inherent in UCIR. By utilizing K-means for generating initial prototypes and approximating class marginal distributions, we modify the constraints in Optimal Transport accordingly, significantly enhancing its performance in UCIR scenarios. Furthermore, we incorporate contrastive learning into the ProtoOT framework to further improve representation learning. This encourages local semantic consistency among features with similar semantics, while also explicitly enforcing separation between features and unmatched prototypes, thereby enhancing global discriminativeness. ProtoOT surpasses existing state-of-the-art methods by a notable margin across benchmark datasets. Notably, on DomainNet, ProtoOT achieves an average P@200 enhancement of 18.17%, and on Office-Home, it demonstrates a P@15 improvement of 3.83%.

Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport

TL;DR

Unsupervised Cross-Domain Image Retrieval (UCIR) aims to retrieve images of the same category across different domains without labels, and prior work typically treated intra-domain representation learning and cross-domain alignment separately. This paper introduces ProtoOT, a unified Prototypical Optimal Transport framework that couples intra-domain clustering with cross-domain alignment by incorporating K-means-derived prototypes as class marginals in OT and employing contrastive learning to foster both local semantic consistency and global discriminativeness. The method uses a memory-bank-based encoder with momentum updates, initializes prototypes via K-means, and optimizes a combined loss that addresses distribution imbalance and cross-domain supervision without labels. Across DomainNet and Office-Home, ProtoOT achieves substantial improvements over state-of-the-art methods, including an average P@200 gain of 18.17% on DomainNet and an average P@15 gain of 3.83% on Office-Home, demonstrating the practical impact of unifying representation learning and alignment under OT for UCIR.

Abstract

Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images sharing the same category across diverse domains without relying on labeled data. Prior approaches have typically decomposed the UCIR problem into two distinct tasks: intra-domain representation learning and cross-domain feature alignment. However, these segregated strategies overlook the potential synergies between these tasks. This paper introduces ProtoOT, a novel Optimal Transport formulation explicitly tailored for UCIR, which integrates intra-domain feature representation learning and cross-domain alignment into a unified framework. ProtoOT leverages the strengths of the K-means clustering method to effectively manage distribution imbalances inherent in UCIR. By utilizing K-means for generating initial prototypes and approximating class marginal distributions, we modify the constraints in Optimal Transport accordingly, significantly enhancing its performance in UCIR scenarios. Furthermore, we incorporate contrastive learning into the ProtoOT framework to further improve representation learning. This encourages local semantic consistency among features with similar semantics, while also explicitly enforcing separation between features and unmatched prototypes, thereby enhancing global discriminativeness. ProtoOT surpasses existing state-of-the-art methods by a notable margin across benchmark datasets. Notably, on DomainNet, ProtoOT achieves an average P@200 enhancement of 18.17%, and on Office-Home, it demonstrates a P@15 improvement of 3.83%.
Paper Structure (23 sections, 13 equations, 6 figures, 6 tables)

This paper contains 23 sections, 13 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Comparison between the standard OT and our proposed ProtoOT to deal with distribution imbalance. Different colors represent different categories, and the dashed lines indicate that the samples are matched to related prototypes.
  • Figure 2: An overview of the proposed method for UCIR. We utilize K-means to generate initial prototypes and approximate class marginal distributions, which have modified the constraints of Optimal Transport. Both the intra-domain representation learning and cross-domain alignment are based on ProtoOT. Furthmore, the two are committed to enhancing the local cosistency and global discriminability of features by employing the same form of contrastive loss.
  • Figure 3: Top 10 retrieval results in DomainNet and Office-Home. The green and red boxes denote correct and incorrect retrievals, respectively.
  • Figure 4: 2-d t-SNE visualizations feature representations learned by DD and our proposed ProtoOT on DomainNet. Each class is represented by a number and Each sample is colored by its corresponding domain.
  • Figure 5: Top 15 retrieval results in DomainNet. The green and red boxes denote correct and incorrect retrievals, respectively.
  • ...and 1 more figures