Table of Contents
Fetching ...

Label Alignment Regularization for Distribution Shift

Ehsan Imani, Guojun Zhang, Runjia Li, Jun Luo, Pascal Poupart, Philip H. S. Torr, Yangchen Pan

TL;DR

This work proposes a regularization method for unsupervised domain adaptation that encourages alignment between the predictions in the target domain and its top singular vectors, and showcases the effectiveness of the method on addressing problems where traditional domain adaptation methods often fall short due to high joint error.

Abstract

Recent work has highlighted the label alignment property (LAP) in supervised learning, where the vector of all labels in the dataset is mostly in the span of the top few singular vectors of the data matrix. Drawing inspiration from this observation, we propose a regularization method for unsupervised domain adaptation that encourages alignment between the predictions in the target domain and its top singular vectors. Unlike conventional domain adaptation approaches that focus on regularizing representations, we instead regularize the classifier to align with the unsupervised target data, guided by the LAP in both the source and target domains. Theoretical analysis demonstrates that, under certain assumptions, our solution resides within the span of the top right singular vectors of the target domain data and aligns with the optimal solution. By removing the reliance on the commonly used optimal joint risk assumption found in classic domain adaptation theory, we showcase the effectiveness of our method on addressing problems where traditional domain adaptation methods often fall short due to high joint error. Additionally, we report improved performance over domain adaptation baselines in well-known tasks such as MNIST-USPS domain adaptation and cross-lingual sentiment analysis.

Label Alignment Regularization for Distribution Shift

TL;DR

This work proposes a regularization method for unsupervised domain adaptation that encourages alignment between the predictions in the target domain and its top singular vectors, and showcases the effectiveness of the method on addressing problems where traditional domain adaptation methods often fall short due to high joint error.

Abstract

Recent work has highlighted the label alignment property (LAP) in supervised learning, where the vector of all labels in the dataset is mostly in the span of the top few singular vectors of the data matrix. Drawing inspiration from this observation, we propose a regularization method for unsupervised domain adaptation that encourages alignment between the predictions in the target domain and its top singular vectors. Unlike conventional domain adaptation approaches that focus on regularizing representations, we instead regularize the classifier to align with the unsupervised target data, guided by the LAP in both the source and target domains. Theoretical analysis demonstrates that, under certain assumptions, our solution resides within the span of the top right singular vectors of the target domain data and aligns with the optimal solution. By removing the reliance on the commonly used optimal joint risk assumption found in classic domain adaptation theory, we showcase the effectiveness of our method on addressing problems where traditional domain adaptation methods often fall short due to high joint error. Additionally, we report improved performance over domain adaptation baselines in well-known tasks such as MNIST-USPS domain adaptation and cross-lingual sentiment analysis.
Paper Structure (30 sections, 14 theorems, 56 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 30 sections, 14 theorems, 56 equations, 5 figures, 5 tables, 1 algorithm.

Key Result

Proposition 0

In the example in this section suppose $v_1^\top \tilde{v}_1 \neq 0$. Then the label alignment solution is $\widehat{w^*} = c w_\mathcal{T}^* / v_1^\top \tilde{v}_1$ with $c > 0$.

Figures (5)

  • Figure 1: (a) Source domain. The black arrows show principal components. (b) Target domain. The green lines show separating hyperplanes found without using any regularization (dashed) and with our regularizer with $\lambda=10^3$ (solid). (c) Performance on the target domain. The red line shows the performance of DANN. The x axis is the regularization coefficient for $\ell_2$ regularization (orange curve) and $\lambda$ for the proposed regularizer (green curves). The proposed regularizer achieves near-perfect accuracy on this domain. Shaded areas are standard errors over 10 runs. Variations in target accuracy of DANN are near zero.
  • Figure 2: (a) Without Implicit Removal. The cyan dashed line is the decision boundary without any adaptation. The orange line shows the decision boundary when $\lambda$ is set to 1 for our proposed regularizer without implicit removal. (b) With Implicit Removal. The green line shows the decision boundary when $\lambda$ is set to 1 with implicit removal. (d) Performance on the target domain. The horizontal axis is $\lambda$ for the proposed regularizer. Before $\lambda$ dominates, the benefits of removing implicit regularization are significant. Shaded areas are standard errors over 10 runs.
  • Figure 3: Projection of the label vector on the top two singular vectors in the Gaussian example. For small values of standard deviation (where the labels are highly correlated with the features) and small values of $\delta$, the label vector is mostly in the direction of the top two singular vectors. The lower bound is applicable in this regime and is close to one.
  • Figure 4: $\lambda$ sensitivity curves of accuracies on MNIST USPS multiclass benchmark. The performance of the proposed method is relatively invariant over different $\lambda$ under various imbalance (subsampling) ratios. Generally greater $\lambda$ comes with better performance in the target domain because more weight and emphasis of loss is put on the information of the target domain.
  • Figure 5: (a) Source domain. The black arrows show principal components. (b) Target domain. The arrows show weights found without using any regularization (purple) and with our regularizer with $\lambda=10^3$ (green). (c) Distance between the estimated and the optimal weights. The proposed regularizer reduces this distance. (d) Performance on the target domain. The x axis is the regularization coefficient for $\ell_2$ regularization and $\lambda$ for the proposed regularizer. The proposed regularizer achieves lower error on this domain. Shaded areas are standard errors over 10 runs.

Theorems & Definitions (14)

  • Proposition 0
  • Theorem 1
  • Theorem 2
  • Corollary 2
  • Proposition 2
  • Proposition 2
  • Lemma 3
  • Proposition 4
  • Lemma 5
  • Proposition 5
  • ...and 4 more