Table of Contents
Fetching ...

Bridging Domains with Approximately Shared Features

Ziliang Samuel Zhong, Xiang Pan, Qi Lei

TL;DR

A statistical framework is proposed that distinguishes the utilities of features based on the variance of their correlation to label $y$ across domains and yields an improved population risk compared to previous results on both source and target tasks, thus partly resolving the paradox.

Abstract

Multi-source domain adaptation aims to reduce performance degradation when applying machine learning models to unseen domains. A fundamental challenge is devising the optimal strategy for feature selection. Existing literature is somewhat paradoxical: some advocate for learning invariant features from source domains, while others favor more diverse features. To address the challenge, we propose a statistical framework that distinguishes the utilities of features based on the variance of their correlation to label $y$ across domains. Under our framework, we design and analyze a learning procedure consisting of learning approximately shared feature representation from source tasks and fine-tuning it on the target task. Our theoretical analysis necessitates the importance of learning approximately shared features instead of only the strictly invariant features and yields an improved population risk compared to previous results on both source and target tasks, thus partly resolving the paradox mentioned above. Inspired by our theory, we proposed a more practical way to isolate the content (invariant+approximately shared) from environmental features and further consolidate our theoretical findings.

Bridging Domains with Approximately Shared Features

TL;DR

A statistical framework is proposed that distinguishes the utilities of features based on the variance of their correlation to label across domains and yields an improved population risk compared to previous results on both source and target tasks, thus partly resolving the paradox.

Abstract

Multi-source domain adaptation aims to reduce performance degradation when applying machine learning models to unseen domains. A fundamental challenge is devising the optimal strategy for feature selection. Existing literature is somewhat paradoxical: some advocate for learning invariant features from source domains, while others favor more diverse features. To address the challenge, we propose a statistical framework that distinguishes the utilities of features based on the variance of their correlation to label across domains. Under our framework, we design and analyze a learning procedure consisting of learning approximately shared feature representation from source tasks and fine-tuning it on the target task. Our theoretical analysis necessitates the importance of learning approximately shared features instead of only the strictly invariant features and yields an improved population risk compared to previous results on both source and target tasks, thus partly resolving the paradox mentioned above. Inspired by our theory, we proposed a more practical way to isolate the content (invariant+approximately shared) from environmental features and further consolidate our theoretical findings.
Paper Structure (56 sections, 12 theorems, 135 equations, 8 figures, 9 tables)

This paper contains 56 sections, 12 theorems, 135 equations, 8 figures, 9 tables.

Key Result

Theorem 3.5

Under Assumption assumption: homogeneous input distribution, assumption: nonlinear uniform, assumption: trace assumption nonlinear, and assumption: diverse source tasks, if $n_1$ is large enough, the average excess risk across the source environments with probability $1-o(1)$ satisfies where $\mathcal{C}_{\text{cont}} \asymp O(k)$ measures the complexity of content features and $\mathcal{C}_{\tex

Figures (8)

  • Figure 1: Diagram for different feature types, mathematically defined in \ref{['eqn: data generation component']}. Our work indicates that in addition to invariant features, we should utilize approximately shared features to fully transfer the knowledge from the source to the target domain. The practical way to learn the approximately shared features is learning features are correlated to both $y$ and the environment $e$, which is the $y$-$e$ shared features
  • Figure 2: Feature Space Visualization: We show the GradCAM++ of the OfficeHome dataset with source pre-trained models feature space. We can see that ERM and NUC focus more locally; The DiWA feature is more globally distributed, while the feature space of ProjectionNet (ours) is more semantically meaningful.
  • Figure 3: ProjectionNet: We disentangle the base representation $\phi$ into target-specific feature $\phi_y$, approximately shared feature $\phi_s$ and environment-specific feature $\Phi_e$. ($\mathcal{P}_y$, $\mathcal{P}_s$, $\mathcal{P}_e$) are the projection heads applied to the base feature. The [$\phi_y$, $\phi_s$] are used in the target label prediction. [$\phi_s$, $\phi_e$] are used in the environment label prediction.
  • Figure 4: Target Loss v.s.few-shot number plot for linear synthetic data. Both ${\rm Reg}_1$ and ${\rm Reg}_2$ in \ref{['eqn: linear fine-tune']} help the domain adaptation performance.
  • Figure 5: Linear Probing Results: We show the linear probing results, ProjectionNet share similar adaptive performance as DiWA
  • ...and 3 more figures

Theorems & Definitions (22)

  • Remark 2.1: Importance of approximately shared features
  • Remark 2.2: Distinction to prior work
  • Remark 2.3
  • Definition 3.2: Covariance between representations
  • Theorem 3.5: Source environment guarantee, informal version of Theorem \ref{['lemma: guarantee of source nonlinear']}
  • Theorem 3.6: Target environment guarantee, informal version of Theorem \ref{['thm: nonlinear main']}
  • Remark 3.7: Linear representation
  • Theorem 1.3: Full version of Theorem \ref{['lemma: guarantee of source nonlinear short']}
  • Theorem 1.4: Full version of Theorem \ref{['thm: nonlinear main short']}
  • Definition 1.5
  • ...and 12 more