Table of Contents
Fetching ...

Domain Generalization via Optimal Transport with Metric Similarity Learning

Fan Zhou, Zhuqing Jiang, Changjian Shui, Boyu Wang, Brahim Chaib-draa

TL;DR

The paper tackles domain generalization by learning invariant features across multiple source domains to generalize to unseen targets. It introduces Wasserstein Adversarial Domain Generalization (WADG), which combines optimal transport-based feature alignment using the Wasserstein distance $W_1$ with a metric-learning objective to enforce domain-agnostic, discriminative boundaries. OT constrains class-conditional features to stay cohesive while the metric loss promotes separability across classes, improving boundary clarity. Experiments on VLCS, PACS, and Office-Home show consistent improvements over baselines, with ablations validating the contributions of both the OT alignment and the metric-learning component.

Abstract

Generalizing knowledge to unseen domains, where data and labels are unavailable, is crucial for machine learning models. We tackle the domain generalization problem to learn from multiple source domains and generalize to a target domain with unknown statistics. The crucial idea is to extract the underlying invariant features across all the domains. Previous domain generalization approaches mainly focused on learning invariant features and stacking the learned features from each source domain to generalize to a new target domain while ignoring the label information, which will lead to indistinguishable features with an ambiguous classification boundary. For this, one possible solution is to constrain the label-similarity when extracting the invariant features and to take advantage of the label similarities for class-specific cohesion and separation of features across domains. Therefore we adopt optimal transport with Wasserstein distance, which could constrain the class label similarity, for adversarial training and also further deploy a metric learning objective to leverage the label information for achieving distinguishable classification boundary. Empirical results show that our proposed method could outperform most of the baselines. Furthermore, ablation studies also demonstrate the effectiveness of each component of our method.

Domain Generalization via Optimal Transport with Metric Similarity Learning

TL;DR

The paper tackles domain generalization by learning invariant features across multiple source domains to generalize to unseen targets. It introduces Wasserstein Adversarial Domain Generalization (WADG), which combines optimal transport-based feature alignment using the Wasserstein distance with a metric-learning objective to enforce domain-agnostic, discriminative boundaries. OT constrains class-conditional features to stay cohesive while the metric loss promotes separability across classes, improving boundary clarity. Experiments on VLCS, PACS, and Office-Home show consistent improvements over baselines, with ablations validating the contributions of both the OT alignment and the metric-learning component.

Abstract

Generalizing knowledge to unseen domains, where data and labels are unavailable, is crucial for machine learning models. We tackle the domain generalization problem to learn from multiple source domains and generalize to a target domain with unknown statistics. The crucial idea is to extract the underlying invariant features across all the domains. Previous domain generalization approaches mainly focused on learning invariant features and stacking the learned features from each source domain to generalize to a new target domain while ignoring the label information, which will lead to indistinguishable features with an ambiguous classification boundary. For this, one possible solution is to constrain the label-similarity when extracting the invariant features and to take advantage of the label similarities for class-specific cohesion and separation of features across domains. Therefore we adopt optimal transport with Wasserstein distance, which could constrain the class label similarity, for adversarial training and also further deploy a metric learning objective to leverage the label information for achieving distinguishable classification boundary. Empirical results show that our proposed method could outperform most of the baselines. Furthermore, ablation studies also demonstrate the effectiveness of each component of our method.

Paper Structure

This paper contains 19 sections, 16 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: Domain Generalization: A learner faces a set labelled data from several source domains, and it aims at extracting invariant features across the seen source domains and learn to generalize to an unseen domain. Based on the manifold assumption goldberg2009multi, each domain $i$ is supported by distribution $\mathcal{D}_i$. The learner can measure the source domain distribution via the source datasets but has no information on the unseen target distribution. After training on the source domains, the model is then deployed to a new domain $\mathcal{D}_t$ for prediction.
  • Figure 2: Use optimal transport (OT) for domain generalization: Typically to directly predict on the unseen domain (the white dashed arrow) is difficult. In order to learn domain invariant features, as showed in the direction of the green arrow we adopted the OT technique to achieve domain alignments for extracting invariant features. After the OT transition, the invariant features can be generalized to unseen domain.
  • Figure 3: The proposed WADG method. (a): the general workflow of WADG method. The model mainly consists of three parts, the feature extractor, classifier and critic function. During training, the model receives all the source domains. The feature extractor is trained to learn invariant features together with the critic function in an adversarial manner. (b): For each pair of source domains $\mathcal{D}_i$ and $\mathcal{D}_j$, optimal transport process for aligning the features from different domains. (c): The metric learning process. For a batch of all source domain instances, we first roughly mining the positive and negative pairs via Eq. \ref{['eq_round_mining']}. Then, compute the corresponding weights via Eq. \ref{['Eq.negative_pair_weight']} and Eq. \ref{['Eq.positive_pair_weight']} to compute $\mathcal{L}_{MS}$ to guide the clustering process.
  • Figure 4: T-SNE visualization of ablation studies on PACS dataset for Target domain as Photo. Detailed analysis is presented in section \ref{['further_analysis']}.
  • Figure 5: T-SNE visualization of ablation studies on VLCS dataset for Target domain as Caltech. Detailed analysis is presented in section \ref{['further_analysis']}.