Table of Contents
Fetching ...

Gradual Fine-Tuning with Graph Routing for Multi-Source Unsupervised Domain Adaptation

Yao Ma, Samuel Louvan, Zhunxuan Wang

TL;DR

This paper introduces a framework for gradual fine tuning (GFT) of machine learning models on multiple source domains as an undirected weighted graph and gives a new generalization error bound along any path within the graph, which is used to determine the optimal path corresponding to the optimal training order.

Abstract

Multi-source unsupervised domain adaptation aims to leverage labeled data from multiple source domains for training a machine learning model to generalize well on a target domain without labels. Source domain selection plays a crucial role in determining the model's performance. It relies on the similarities amongst source and target domains. Nonetheless, existing work for source domain selection often involves heavyweight computational procedures, especially when dealing with numerous source domains and the need to identify the best ones from them. In this paper, we introduce a framework for gradual fine tuning (GFT) of machine learning models on multiple source domains. We represent multiple source domains as an undirected weighted graph. We then give a new generalization error bound for GFT along any path within the graph, which is used to determine the optimal path corresponding to the optimal training order. With this formulation, we introduce three lightweight graph-routing strategies which tend to minimize the error bound. Our best strategy improves $2.3\%$ of accuracy over the state-of-the-art on Natural Language Inference (NLI) task and achieves competitive performance on Sentiment Analysis (SA) task, especially a $3.9\%$ improvement on a more diverse subset of data we use for SA.

Gradual Fine-Tuning with Graph Routing for Multi-Source Unsupervised Domain Adaptation

TL;DR

This paper introduces a framework for gradual fine tuning (GFT) of machine learning models on multiple source domains as an undirected weighted graph and gives a new generalization error bound along any path within the graph, which is used to determine the optimal path corresponding to the optimal training order.

Abstract

Multi-source unsupervised domain adaptation aims to leverage labeled data from multiple source domains for training a machine learning model to generalize well on a target domain without labels. Source domain selection plays a crucial role in determining the model's performance. It relies on the similarities amongst source and target domains. Nonetheless, existing work for source domain selection often involves heavyweight computational procedures, especially when dealing with numerous source domains and the need to identify the best ones from them. In this paper, we introduce a framework for gradual fine tuning (GFT) of machine learning models on multiple source domains. We represent multiple source domains as an undirected weighted graph. We then give a new generalization error bound for GFT along any path within the graph, which is used to determine the optimal path corresponding to the optimal training order. With this formulation, we introduce three lightweight graph-routing strategies which tend to minimize the error bound. Our best strategy improves of accuracy over the state-of-the-art on Natural Language Inference (NLI) task and achieves competitive performance on Sentiment Analysis (SA) task, especially a improvement on a more diverse subset of data we use for SA.

Paper Structure

This paper contains 14 sections, 6 theorems, 28 equations, 5 figures, 4 tables.

Key Result

Lemma 5.1

Given two joint distributions $D_1$ and $D_2$ over $X\times Y$, the expected loss of a classifier $h$ satisfies

Figures (5)

  • Figure 1: Gft illustration for $2$-source domain adaptation with linear binary SVM. Source 1 has a distribution close to target but its scale is diminutive, whereas Source 2 has a large size but diverges further from the target. Model is first trained from scratch on Source 2, giving a clear hyperplane splitting two classes. Then fine-tuned on Source 1, shifting the hyperplane towards Source 1 distribution. Evaluation on target demonstrates the efficacy of Gft.
  • Figure 2: Accuracy ablation on different path length on two datasets. The $x$-axis indicates the path length of NnGft, i.e. number of source domains that is included in NnGft. The $y$-axis indicates the accuracy on a particular path length.
  • Figure 3: Two datasets drawn from different distributions are available as training data. The test data from target domain has higher discrepancy to dataset 1 than dataset 2. An linear model trained on dataset 1 achieves $0.555$ accuracy on test data as shown in subfigure (a). The same model trained only on dataset 2 achieves $0.54$ accuracy on test data. Although dataset 1 is more similar to the test dataset, it still achieves performance since the number of samples is very limited. By jointing the two sources, the model's accuracy is $0.53$ as dataset 2 has much more samples than 1. In the last subfigure, we applied GFT algorithm by following the order Source $2 \to$ Source 1. The modeled achieves 0.805 accuracy.
  • Figure 4: The matrix of pairwise Wasserstein-1 distance between domains in the MultiNLI dataset.
  • Figure 5: The matrix of pairwise Wasserstein-1 distance between domains in the Amazon dataset.

Theorems & Definitions (7)

  • Lemma 5.1
  • Theorem 5.2
  • Lemma A.1
  • Lemma A.2
  • proof
  • Lemma C.1
  • Lemma D.1