Table of Contents
Fetching ...

Point-Identification of a Robust Predictor Under Latent Shift with Imperfect Proxies

Zahra Rahiminasab, Reza Soumi, Arto Klami, Samuel Kaski

Abstract

Addressing the domain adaptation problem becomes more challenging when distribution shifts across domains stem from latent confounders that affect both covariates and outcomes. Existing proxy-based approaches that address latent shift rely on a strong completeness assumption to uniquely determine (point-identify) a robust predictor. Completeness requires that proxies have sufficient information about variations in latent confounders. For imperfect proxies the mapping from confounders to the space of proxy distributions is non-injective, and multiple latent confounder values can generate the same proxy distribution. This breaks the completeness assumption and observed data are consistent with multiple potential predictors (set-identified). To address this, we introduce latent equivalent classes (LECs). LECs are defined as groups of latent confounders that induce the same conditional proxy distribution. We show that point-identification for the robust predictor remains achievable as long as multiple domains differ sufficiently in how they mix proxy-induced LECs to form the robust predictor. This domain diversity condition is formalized as a cross-domain rank condition on the mixture weights, which is substantially weaker assumption than completeness. We introduce the Proximal Quasi-Bayesian Active learning (PQAL) framework, which actively queries a minimal set of diverse domains that satisfy this rank condition. PQAL can efficiently recover the point-identified predictor, demonstrates robustness to varying degrees of shift and outperforms previous methods on synthetic data and semi-synthetic dSprites dataset.

Point-Identification of a Robust Predictor Under Latent Shift with Imperfect Proxies

Abstract

Addressing the domain adaptation problem becomes more challenging when distribution shifts across domains stem from latent confounders that affect both covariates and outcomes. Existing proxy-based approaches that address latent shift rely on a strong completeness assumption to uniquely determine (point-identify) a robust predictor. Completeness requires that proxies have sufficient information about variations in latent confounders. For imperfect proxies the mapping from confounders to the space of proxy distributions is non-injective, and multiple latent confounder values can generate the same proxy distribution. This breaks the completeness assumption and observed data are consistent with multiple potential predictors (set-identified). To address this, we introduce latent equivalent classes (LECs). LECs are defined as groups of latent confounders that induce the same conditional proxy distribution. We show that point-identification for the robust predictor remains achievable as long as multiple domains differ sufficiently in how they mix proxy-induced LECs to form the robust predictor. This domain diversity condition is formalized as a cross-domain rank condition on the mixture weights, which is substantially weaker assumption than completeness. We introduce the Proximal Quasi-Bayesian Active learning (PQAL) framework, which actively queries a minimal set of diverse domains that satisfy this rank condition. PQAL can efficiently recover the point-identified predictor, demonstrates robustness to varying degrees of shift and outperforms previous methods on synthetic data and semi-synthetic dSprites dataset.
Paper Structure (35 sections, 5 theorems, 61 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 35 sections, 5 theorems, 61 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Let $\{O_j\}_{j=1}^{|O|}$ be the latent equivalent classes (LECs) defined in Definition df:LECS. Assume the regularity condition of Assumption ass:Reg holds. For each LEC $O_j$, define its conditional distribution $P_j(A)=P(W\in A \mid U\in O_j)$, the corresponding mixing weight $\pi_j(x,z):=P(U \in

Figures (4)

  • Figure 1: Causal diagram that complies with the latent shift assumption
  • Figure 2: Overview of PQAL framework
  • Figure 3: Effective rank with respect to proxy imperfectness for different number of environments
  • Figure 4: MSE error for different acquisition functions on Dataset 1(left) and Dataset 2 (right)

Theorems & Definitions (18)

  • Definition 1: Imperfect proxies
  • Definition 2: Latent equivalent classes (LECs)
  • Remark 1: Non-degenerate LECs
  • Lemma 1: Decomposition of Conditional Distribution
  • proof
  • Definition 3: Distinguishing environment set
  • Theorem 1: Cross-domain rank condition and completeness
  • proof
  • Theorem 2: Existence and set identifiability of bridge function
  • Theorem 3: Point-identification of predictor
  • ...and 8 more