Bridging Domains with Approximately Shared Features

Ziliang Samuel Zhong; Xiang Pan; Qi Lei

Bridging Domains with Approximately Shared Features

Ziliang Samuel Zhong, Xiang Pan, Qi Lei

TL;DR

A statistical framework is proposed that distinguishes the utilities of features based on the variance of their correlation to label $y$ across domains and yields an improved population risk compared to previous results on both source and target tasks, thus partly resolving the paradox.

Abstract

Multi-source domain adaptation aims to reduce performance degradation when applying machine learning models to unseen domains. A fundamental challenge is devising the optimal strategy for feature selection. Existing literature is somewhat paradoxical: some advocate for learning invariant features from source domains, while others favor more diverse features. To address the challenge, we propose a statistical framework that distinguishes the utilities of features based on the variance of their correlation to label $y$ across domains. Under our framework, we design and analyze a learning procedure consisting of learning approximately shared feature representation from source tasks and fine-tuning it on the target task. Our theoretical analysis necessitates the importance of learning approximately shared features instead of only the strictly invariant features and yields an improved population risk compared to previous results on both source and target tasks, thus partly resolving the paradox mentioned above. Inspired by our theory, we proposed a more practical way to isolate the content (invariant+approximately shared) from environmental features and further consolidate our theoretical findings.

Bridging Domains with Approximately Shared Features

TL;DR

A statistical framework is proposed that distinguishes the utilities of features based on the variance of their correlation to label

across domains and yields an improved population risk compared to previous results on both source and target tasks, thus partly resolving the paradox.

Abstract

across domains. Under our framework, we design and analyze a learning procedure consisting of learning approximately shared feature representation from source tasks and fine-tuning it on the target task. Our theoretical analysis necessitates the importance of learning approximately shared features instead of only the strictly invariant features and yields an improved population risk compared to previous results on both source and target tasks, thus partly resolving the paradox mentioned above. Inspired by our theory, we proposed a more practical way to isolate the content (invariant+approximately shared) from environmental features and further consolidate our theoretical findings.

Paper Structure (56 sections, 12 theorems, 135 equations, 8 figures, 9 tables)

This paper contains 56 sections, 12 theorems, 135 equations, 8 figures, 9 tables.

Introduction
Our contribution.
Related Works
Selection bias
Spurious correlation
Notations
Methodology
Data Generation Process
The Meta-Representation Learning Algorithm
Isolating Content Features in Practice
Nuclear Norm Regularization
ProjectionNet (ours)
Theoretical Analysis
Experiments
Synthetic Data
...and 41 more sections

Key Result

Theorem 3.5

Under Assumption assumption: homogeneous input distribution, assumption: nonlinear uniform, assumption: trace assumption nonlinear, and assumption: diverse source tasks, if $n_1$ is large enough, the average excess risk across the source environments with probability $1-o(1)$ satisfies where $\mathcal{C}_{\text{cont}} \asymp O(k)$ measures the complexity of content features and $\mathcal{C}_{\tex

Figures (8)

Figure 1: Diagram for different feature types, mathematically defined in \ref{['eqn: data generation component']}. Our work indicates that in addition to invariant features, we should utilize approximately shared features to fully transfer the knowledge from the source to the target domain. The practical way to learn the approximately shared features is learning features are correlated to both $y$ and the environment $e$, which is the $y$-$e$ shared features
Figure 2: Feature Space Visualization: We show the GradCAM++ of the OfficeHome dataset with source pre-trained models feature space. We can see that ERM and NUC focus more locally; The DiWA feature is more globally distributed, while the feature space of ProjectionNet (ours) is more semantically meaningful.
Figure 3: ProjectionNet: We disentangle the base representation $\phi$ into target-specific feature $\phi_y$, approximately shared feature $\phi_s$ and environment-specific feature $\Phi_e$. ($\mathcal{P}_y$, $\mathcal{P}_s$, $\mathcal{P}_e$) are the projection heads applied to the base feature. The [$\phi_y$, $\phi_s$] are used in the target label prediction. [$\phi_s$, $\phi_e$] are used in the environment label prediction.
Figure 4: Target Loss v.s.few-shot number plot for linear synthetic data. Both ${\rm Reg}_1$ and ${\rm Reg}_2$ in \ref{['eqn: linear fine-tune']} help the domain adaptation performance.
Figure 5: Linear Probing Results: We show the linear probing results, ProjectionNet share similar adaptive performance as DiWA
...and 3 more figures

Theorems & Definitions (22)

Remark 2.1: Importance of approximately shared features
Remark 2.2: Distinction to prior work
Remark 2.3
Definition 3.2: Covariance between representations
Theorem 3.5: Source environment guarantee, informal version of Theorem \ref{['lemma: guarantee of source nonlinear']}
Theorem 3.6: Target environment guarantee, informal version of Theorem \ref{['thm: nonlinear main']}
Remark 3.7: Linear representation
Theorem 1.3: Full version of Theorem \ref{['lemma: guarantee of source nonlinear short']}
Theorem 1.4: Full version of Theorem \ref{['thm: nonlinear main short']}
Definition 1.5
...and 12 more

Bridging Domains with Approximately Shared Features

TL;DR

Abstract

Bridging Domains with Approximately Shared Features

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (22)