Table of Contents
Fetching ...

Lighter, Better, Faster Multi-Source Domain Adaptation with Gaussian Mixture Models and Optimal Transport

Eduardo Fernandes Montesuma, Fred Ngolè Mboula, Antoine Souloumiac

TL;DR

The paper tackles Multi-Source Domain Adaptation under distribution shift by proposing an optimal-transport-based framework built on Gaussian Mixture Models. It introduces two strategies, GMM-Wasserstein Barycenter Transport (GMM-WBT) and GMM-DaDiL, alongside a parametric approach to compute mixture-Wasserstein barycenters and a supervised variant that incorporates class labels. The core contributions include a first-order mapping of GMM components under MW2, a supervised mixture-Wasserstein distance, and efficient barycenter algorithms tailored for GMMs, enabling lighter and faster MSDA. Empirical results on four benchmarks show that the proposed methods outperform or match prior art while reducing parameter counts and computation time, highlighting the approach's scalability and practical impact for domain adaptation tasks.

Abstract

In this paper, we tackle Multi-Source Domain Adaptation (MSDA), a task in transfer learning where one adapts multiple heterogeneous, labeled source probability measures towards a different, unlabeled target measure. We propose a novel framework for MSDA, based on Optimal Transport (OT) and Gaussian Mixture Models (GMMs). Our framework has two key advantages. First, OT between GMMs can be solved efficiently via linear programming. Second, it provides a convenient model for supervised learning, especially classification, as components in the GMM can be associated with existing classes. Based on the GMM-OT problem, we propose a novel technique for calculating barycenters of GMMs. Based on this novel algorithm, we propose two new strategies for MSDA: GMM-Wasserstein Barycenter Transport (WBT) and GMM-Dataset Dictionary Learning (DaDiL). We empirically evaluate our proposed methods on four benchmarks in image classification and fault diagnosis, showing that we improve over the prior art while being faster and involving fewer parameters. Our code is publicly available at https://github.com/eddardd/gmm_msda

Lighter, Better, Faster Multi-Source Domain Adaptation with Gaussian Mixture Models and Optimal Transport

TL;DR

The paper tackles Multi-Source Domain Adaptation under distribution shift by proposing an optimal-transport-based framework built on Gaussian Mixture Models. It introduces two strategies, GMM-Wasserstein Barycenter Transport (GMM-WBT) and GMM-DaDiL, alongside a parametric approach to compute mixture-Wasserstein barycenters and a supervised variant that incorporates class labels. The core contributions include a first-order mapping of GMM components under MW2, a supervised mixture-Wasserstein distance, and efficient barycenter algorithms tailored for GMMs, enabling lighter and faster MSDA. Empirical results on four benchmarks show that the proposed methods outperform or match prior art while reducing parameter counts and computation time, highlighting the approach's scalability and practical impact for domain adaptation tasks.

Abstract

In this paper, we tackle Multi-Source Domain Adaptation (MSDA), a task in transfer learning where one adapts multiple heterogeneous, labeled source probability measures towards a different, unlabeled target measure. We propose a novel framework for MSDA, based on Optimal Transport (OT) and Gaussian Mixture Models (GMMs). Our framework has two key advantages. First, OT between GMMs can be solved efficiently via linear programming. Second, it provides a convenient model for supervised learning, especially classification, as components in the GMM can be associated with existing classes. Based on the GMM-OT problem, we propose a novel technique for calculating barycenters of GMMs. Based on this novel algorithm, we propose two new strategies for MSDA: GMM-Wasserstein Barycenter Transport (WBT) and GMM-Dataset Dictionary Learning (DaDiL). We empirically evaluate our proposed methods on four benchmarks in image classification and fault diagnosis, showing that we improve over the prior art while being faster and involving fewer parameters. Our code is publicly available at https://github.com/eddardd/gmm_msda
Paper Structure (15 sections, 2 theorems, 22 equations, 6 figures, 2 tables, 2 algorithms)

This paper contains 15 sections, 2 theorems, 22 equations, 6 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

Let $P$ and $Q$ be two with components $P_{i} = \mathcal{N}(\mathbf{m}_{i}^{(P)}, (\mathbf{s}_{i}^{(P)})^{2})$ (resp. $Q_{j}$) and $\omega^{\star}$ be the solution of eq. eq:gmmot. The first-order optimality conditions of $\mathcal{MW}_{2}^{2}$, with respect $\mathbf{m}_{i}$ and $\mathbf{s}_{i}$ are where $\omega^{\star}$ is the solution of eq. eq:gmmot.

Figures (6)

  • Figure 1: Overview of proposed methods. represent datasets, circles represent barycenters and triangles represent learned measures. Blue and orange elements represent labeled and unlabeled measures respectively. In -, a labeled is determined for the target domain by transporting the barycenter of sources. In -, we learn to express each domain as a barycenter of learned , called atoms, through dictionary learning.
  • Figure 2: Illustration of axis-aligned . This hypothesis leads to that need more components to express the underlying data distribution.
  • Figure 3: Data and used in the toy experiment. In (a) Each of these datasets was generated by applying an affine transformation to an initial dataset. In (b), we show an axis-aligned fitted to the data via . In (c), we show a summary of -, where show the plan between components (upper part) between $B$ (left) and $Q_{T}$ (right). The resulting labeled is shown in the lower right part of (c).
  • Figure 4: Optimization and reconstruction summaries using -. In (a), we show the evolution of loss, negative log-likelihood and barycentric coordinates (i.e., $\lambda_{\ell}$) over the course of optimization. In (b), we show the reconstructed (i.e., $\mathcal{B}(\lambda_{\ell},\mathcal{P})$) when the algorithm converges.
  • Figure 5: Lighter, Better, Faster. In (a), we analyse the performance of interpolations $\mathcal{B}((\lambda_{0},1-\lambda_{0});\mathcal{Q}_{S})$, $\mathcal{Q}_{S} = \{Q_{S_{1}}, Q_{S_{2}}\}$ and $\mathcal{B}((\lambda_{0},1-\lambda_{0});\mathcal{P})$ with learned $\mathcal{P} = \{P_{1}, P_{2}\}$ for - and -. In (b), we analyse the efficiency of barycenter-based methods under an increasing number of components (number of samples for and ). - has state-of-the-art performance even for the extreme case where $K = 65$. In (c), we compare the running time of - with that of , as a function of number of components $K$ and batch size $n_{b}$, respectively. This figure illustrates the speedup of - as the number of samples in (and hence, $M = \lceil n/n_{b} \rceil$) increases. Circles represent the average over $5$ independent runs, while the error bars show $2$ times the standard deviation.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Definition 1