Table of Contents
Fetching ...

Proxy Methods for Domain Adaptation

Katherine Tsai, Stephen R. Pfohl, Olawale Salaudeen, Nicole Chiou, Matt J. Kusner, Alexander D'Amour, Sanmi Koyejo, Arthur Gretton

TL;DR

This work demonstrates that proxy variables allow for adaptation to distribution shift without explicitly recovering or modeling latent variables, and develops a two-stage kernel estimation approach to adapt to complex distribution shifts in both settings.

Abstract

We study the problem of domain adaptation under distribution shift, where the shift is due to a change in the distribution of an unobserved, latent variable that confounds both the covariates and the labels. In this setting, neither the covariate shift nor the label shift assumptions apply. Our approach to adaptation employs proximal causal learning, a technique for estimating causal effects in settings where proxies of unobserved confounders are available. We demonstrate that proxy variables allow for adaptation to distribution shift without explicitly recovering or modeling latent variables. We consider two settings, (i) Concept Bottleneck: an additional ''concept'' variable is observed that mediates the relationship between the covariates and labels; (ii) Multi-domain: training data from multiple source domains is available, where each source domain exhibits a different distribution over the latent confounder. We develop a two-stage kernel estimation approach to adapt to complex distribution shifts in both settings. In our experiments, we show that our approach outperforms other methods, notably those which explicitly recover the latent confounder.

Proxy Methods for Domain Adaptation

TL;DR

This work demonstrates that proxy variables allow for adaptation to distribution shift without explicitly recovering or modeling latent variables, and develops a two-stage kernel estimation approach to adapt to complex distribution shifts in both settings.

Abstract

We study the problem of domain adaptation under distribution shift, where the shift is due to a change in the distribution of an unobserved, latent variable that confounds both the covariates and the labels. In this setting, neither the covariate shift nor the label shift assumptions apply. Our approach to adaptation employs proximal causal learning, a technique for estimating causal effects in settings where proxies of unobserved confounders are available. We demonstrate that proxy variables allow for adaptation to distribution shift without explicitly recovering or modeling latent variables. We consider two settings, (i) Concept Bottleneck: an additional ''concept'' variable is observed that mediates the relationship between the covariates and labels; (ii) Multi-domain: training data from multiple source domains is available, where each source domain exhibits a different distribution over the latent confounder. We develop a two-stage kernel estimation approach to adapt to complex distribution shifts in both settings. In our experiments, we show that our approach outperforms other methods, notably those which explicitly recover the latent confounder.
Paper Structure (36 sections, 10 theorems, 76 equations, 7 figures, 2 tables)

This paper contains 36 sections, 10 theorems, 76 equations, 7 figures, 2 tables.

Key Result

Theorem 4.1

Assume that $h_0^p$ and $h_0^q$ exist (i.e., regularity Assumptions assumption:completeness_h0--assumption:rc_h0_2 hold). Then given Assumptions assumption:graph, assumption:CI, assumption:completeness, assumption:subsetsupport we have that, for any $c\in\mathcal{C}$, almost surely with respect to $Q(U)$. This implies that

Figures (7)

  • Figure 1: Causal diagrams. The shaded circle denotes unobserved variable and the solid circle denotes observed variable. $X$ is the covariate, $Y$ is the response, $C$ is the concept, $W$ is the proxy, $Z$ is the domain-related variable, and $U$ is the latent variable.
  • Figure 2: Adaptation results with concept and proxy. Shown is the average evaluation metric on held-out target distribution samples across 10 independent replicates of the data. The proposed method is robust to the latent shift compared to the baselines in both cases. (a) We set $P(U=1)=0.1$. Both the AUROC and accuracy remains nearly constant in various degree of shifts, while the performance of other baselines drops as $Q(U=1)$ moves to $0.9$. (b) The left figure denotes the density function of $U$, the overlapping area of two distribution shrinks as $a$ moves rightward. The result on the right shows that our method is robust even when the overlapping area between two distributions is small.
  • Figure 3: Concept and multi-domain adaptation with MIMIC-CXR. Shown are the mean $\pm$ SD AUROC of concept (left) and multi-domain adaptation (right) for classification of "No finding" from embeddings of chest X-rays over five replicates of a sampling procedure that introduces a shift in the prevalence of "No finding" with patient sex subgroups, where radiology report embeddings serve as concept variables $C$ and patient age serves as the proxy $W$. In the concept adaptation experiment, the source domain corresponds to $P(U=1) = P(Y = 1 \mid \textrm{Sex}=\textrm{Female}) = P(Y = 0 \mid \textrm{Sex}=\textrm{Male})=0.1$. In the multi-domain adaptation experiment, we consider two source domains $P(U=1)=\{0.1, 0.2\}.$
  • Figure 4: Classification results with $a_w=2,3$. The figures indicate that LSA-S and LSA-S w/ target $W$ have comparable performance, aggregating the target $W$ does not seem to improve the performance.
  • Figure 5: Top left: results of regression task 1. The proposed method is close to the ORACLE method as compared all other competing methods that is vulnerable to the distribution shifts. Other figures: results of regression task 2. In each plot, we fix $b$ and vary $a$. For all plots, it appears that when $a=b$, the mean squared error of all methods converge to a point. This is the case when the target density function of $U$ has a peak centered around $0.5$, as shown in Figure \ref{['fig:beta_distribution']}, and hence $Y=(2U-1)X$ is close to zero for most samples.
  • ...and 2 more figures

Theorems & Definitions (11)

  • Theorem 4.1
  • Proposition 4.2
  • Proposition 4.3
  • Example 4.4
  • Proposition 5.1
  • Proposition A.1: Existence of $h_0$, adapted from Proposition 1 in miao2018identifying
  • Proposition A.2: Existence of $m_0$, Proposition 1 in miao2018identifying
  • Lemma A.3: Picard's Theorem
  • Lemma C.1
  • Lemma C.2
  • ...and 1 more