Table of Contents
Fetching ...

Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models

Ye Tian, Haolei Weng, Lucy Xia, Yang Feng

TL;DR

This work develops robust, EM-based multi-task and transfer learning procedures for Gaussian mixture models (GMMs). By incorporating a penalized objective that couples task discriminants and introducing cluster-label alignment steps, the authors obtain adaptive, outlier-robust learning with minimax-rate guarantees for parameter estimation and mis-clustering error. They show that, when tasks are related, multi-task learning improves over single-task approaches, while still performing competitively when tasks differ or include outliers; transfer learning similarly leverages related sources to enhance a target task. Theoretical results are complemented by extensive simulations and real-data studies (e.g., HAR and PRHD), and the supplementary material provides detailed alignment algorithms, multi-cluster extensions, and comprehensive proofs. Overall, the framework fills a gap in unsupervised MTL/TL theory for GMMs and offers practical algorithms with provable guarantees and robust performance across diverse settings.

Abstract

Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that effectively utilizes unknown similarities between related tasks and is robust against a fraction of outlier tasks from arbitrary distributions. The proposed procedure is shown to achieve the minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Additionally, iterative unsupervised multi-task and transfer learning methods may suffer from an initialization alignment problem, and two alignment algorithms are proposed to resolve the issue. Finally, we demonstrate the effectiveness of our methods through simulations and real data examples. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.

Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models

TL;DR

This work develops robust, EM-based multi-task and transfer learning procedures for Gaussian mixture models (GMMs). By incorporating a penalized objective that couples task discriminants and introducing cluster-label alignment steps, the authors obtain adaptive, outlier-robust learning with minimax-rate guarantees for parameter estimation and mis-clustering error. They show that, when tasks are related, multi-task learning improves over single-task approaches, while still performing competitively when tasks differ or include outliers; transfer learning similarly leverages related sources to enhance a target task. Theoretical results are complemented by extensive simulations and real-data studies (e.g., HAR and PRHD), and the supplementary material provides detailed alignment algorithms, multi-cluster extensions, and comprehensive proofs. Overall, the framework fills a gap in unsupervised MTL/TL theory for GMMs and offers practical algorithms with provable guarantees and robust performance across diverse settings.

Abstract

Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that effectively utilizes unknown similarities between related tasks and is robust against a fraction of outlier tasks from arbitrary distributions. The proposed procedure is shown to achieve the minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Additionally, iterative unsupervised multi-task and transfer learning methods may suffer from an initialization alignment problem, and two alignment algorithms are proposed to resolve the issue. Finally, we demonstrate the effectiveness of our methods through simulations and real data examples. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.
Paper Structure (91 sections, 58 theorems, 409 equations, 22 figures, 4 tables, 8 algorithms)

This paper contains 91 sections, 58 theorems, 409 equations, 22 figures, 4 tables, 8 algorithms.

Key Result

Theorem 1

(Upper bounds of the estimation error of GMM parameters for MTL-GMM) Suppose Assumption asmp: upper bound multitask est error holds for some $S$ with $|S| \geq s$ and $\epsilon \coloneqq \frac{K-s}{K} < 1/3$. Let $\lambda^{[0]} \geq C_1\max_{k=1:K}\sqrt{n_k}$, $C_{\lambda} \geq C_1$ and $\kappa > C_ where $\kappa' \in (0, 1)$ is some constant and $n_S= \sum_{k \in S}n_k$. When $T \geq C\log(\max_{

Figures (22)

  • Figure 1: Examples of well-aligned (left) and badly-aligned (right) initializations.
  • Figure 2: The performance of different methods in Simulation 1 under different outlier proportions. The upper panel shows the performance without outlier tasks ($\epsilon = 0$), and the lower panel shows the performance with two outlier tasks ($\epsilon = 0.2$). $h$ changes from 0 to 10 with increment 1. Estimation error of $\{\bm{\beta}^{(k)*}\}_{k \in S}$ stands for $\max_{k \in S}(\|\widehat{\bm{\beta}}^{(k)[T]} - \bm{\beta}^{(k)*}\|_{2} \wedge \|\widehat{\bm{\beta}}^{(k)[T]} + \bm{\beta}^{(k)*}\|_{2})$ and maximum mis-clustering error represents the maximum empirical mis-clustering error rate calculated on the test set of tasks in $S$.
  • Figure 3: Box plots of mis-clustering errors of 30 tasks for each method for HAR data set.
  • Figure S.4: An illustration of multi-component GMM with varying numbers of clusters across tasks.
  • Figure S.5: Comparison of NMI across different methods in Simulation 1 (no outliers), under varying numbers of clusters specified in the algorithms. The numbers in the method names (e.g., Single-task-GMM-2, MTL-GMM-3) indicate the specified cluster number. In Simulation 1, the true number of clusters is 2.
  • ...and 17 more figures

Theorems & Definitions (92)

  • Remark 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5: Alignment correctness for Algorithm \ref{['algo: exhaustive alignment']}
  • Remark 2
  • Remark 3
  • Theorem 6: Alignment correctness for Algorithm \ref{['algo: greedy alignment']}
  • Remark 4
  • ...and 82 more