Table of Contents
Fetching ...

Generalizing to Unseen Domains with Wasserstein Distributional Robustness under Limited Source Knowledge

Jingge Wang, Liyan Xie, Yao Xie, Shao-Lun Huang, Yang Li

TL;DR

A novel domain generalization framework called Wasserstein Distributionally Robust Domain Generalization (WDRDG), inspired by the concept of distributionally robust optimization, is proposed, which encourages robustness over conditional distributions within class-specific Wasserstein uncertainty sets and optimize the worst-case performance of a classifier over these uncertainty sets.

Abstract

Domain generalization aims at learning a universal model that performs well on unseen target domains, incorporating knowledge from multiple source domains. In this research, we consider the scenario where different domain shifts occur among conditional distributions of different classes across domains. When labeled samples in the source domains are limited, existing approaches are not sufficiently robust. To address this problem, we propose a novel domain generalization framework called {Wasserstein Distributionally Robust Domain Generalization} (WDRDG), inspired by the concept of distributionally robust optimization. We encourage robustness over conditional distributions within class-specific Wasserstein uncertainty sets and optimize the worst-case performance of a classifier over these uncertainty sets. We further develop a test-time adaptation module leveraging optimal transport to quantify the relationship between the unseen target domain and source domains to make adaptive inference for target data. Experiments on the Rotated MNIST, PACS and the VLCS datasets demonstrate that our method could effectively balance the robustness and discriminability in challenging generalization scenarios.

Generalizing to Unseen Domains with Wasserstein Distributional Robustness under Limited Source Knowledge

TL;DR

A novel domain generalization framework called Wasserstein Distributionally Robust Domain Generalization (WDRDG), inspired by the concept of distributionally robust optimization, is proposed, which encourages robustness over conditional distributions within class-specific Wasserstein uncertainty sets and optimize the worst-case performance of a classifier over these uncertainty sets.

Abstract

Domain generalization aims at learning a universal model that performs well on unseen target domains, incorporating knowledge from multiple source domains. In this research, we consider the scenario where different domain shifts occur among conditional distributions of different classes across domains. When labeled samples in the source domains are limited, existing approaches are not sufficiently robust. To address this problem, we propose a novel domain generalization framework called {Wasserstein Distributionally Robust Domain Generalization} (WDRDG), inspired by the concept of distributionally robust optimization. We encourage robustness over conditional distributions within class-specific Wasserstein uncertainty sets and optimize the worst-case performance of a classifier over these uncertainty sets. We further develop a test-time adaptation module leveraging optimal transport to quantify the relationship between the unseen target domain and source domains to make adaptive inference for target data. Experiments on the Rotated MNIST, PACS and the VLCS datasets demonstrate that our method could effectively balance the robustness and discriminability in challenging generalization scenarios.
Paper Structure (16 sections, 3 theorems, 18 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 16 sections, 3 theorems, 18 equations, 7 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Suppose the Wasserstein barycenter $B_k^\ast$ for each class as defined in (eq:barycenter) is supported on $b$ samples. Let $S_b$ be the union of the support of $\{B_1^\ast,\ldots,B_K^\ast\}$ which contains ${n_b}=Kb$ samples $\{\boldsymbol{x}_i^b, i=1,\ldots,n_b\}$ in total. The class prior distrib and the optimal prediction function of (eq:our_minimax) satisfies $\phi^*_k(\boldsymbol{x}_i^b)={P_

Figures (7)

  • Figure 1: An overview of our WDRDG framework, consisting of three components: (a) Wasserstein uncertainty set construction for each class based on the empirical Wasserstein barycenters and radius obtained from given source domains. One constraint is added to control the discriminability of LFDs; (b) distributionally robust optimization to solve for the least favorable distributions; (c) adaptive inference for target testing samples based on probability mass on LFDs and coupling matrix from optimal transportation between barycenter samples and target samples.
  • Figure 2: Comparison between $\theta_i^*+\theta_j^*$ and the Wasserstein distance $W_2(B_i^*, B_j^*)$ for all $10$ unique pairs $(i,j)$ among all $5$ classes of the VLCS dataset. The sum of uncertainty radius of any two classes is larger than the Wasserstein distance between the corresponding barycenters. The oversized radius will lead to overlapping class-specific uncertainty sets, and the distributions within them will be indistinguishable.
  • Figure 3: Visualization of example images from four domains of the Rotated MNIST dataset with rotation angles of $0^\circ$, $30^\circ$, $60^\circ$, $90^\circ$.
  • Figure 4: Performance comparison for the VLCS, PACS and Rotated MNIST dataset under different size of training samples per class. Each row shows the results for a dataset, and each column shows the generalization result for a certain target domain. Average performance of five methods are represented by different colors, and the corresponding shadow shows the standard deviation of 5 experimental trials. Our WDRDG framework outperforms KNN, MDA and CIDG with higher accuracy and smaller standard deviation. Also, it has more advantage over MLDG especially when the source training sample size is limited. For example, WDRDG outperforms MLDG by up to $46.79\%$ when the target domain is Caltech-101 in the VLCS dataset, by up to $20.95\%$ for target domain Art Painting in the PACS dataset, and by up to $20.71\%$ for target domain $r_{0}$ in the Rotated MNIST dataset with training sample size of 2 for each class.
  • Figure 5: Average generalization performance of different methods on the VLCS, PACS and Rotated MNIST dataset. As the training sample size increases, all methods obtain better performance. Our WDRDG framework outperforms other baselines, especially in few-shot settings. When the sample size is less than 10 per class, WDRDG provides at least $3.75\%$, $4.73\%$, $3.86\%$ better generalization ability than others on the VLCS, PACS and Rotated MNIST dataset, respectively.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Proposition 1
  • Theorem 2