Table of Contents
Fetching ...

Adaptive Transfer Clustering: A Unified Framework

Yuqi Gu, Zhongyuan Lyu, Kaizheng Wang

TL;DR

An adaptive transfer clustering algorithm that automatically leverages the commonality in the presence of unknown discrepancy, by optimizing an estimated bias-variance decomposition, applies to a broad class of statistical models including Gaussian mixture models, stochastic block models, and latent class models.

Abstract

We propose a general transfer learning framework for clustering given a main dataset and an auxiliary one about the same subjects. The two datasets may reflect similar but different latent grouping structures of the subjects. We propose an adaptive transfer clustering (ATC) algorithm that automatically leverages the commonality in the presence of unknown discrepancy, by optimizing an estimated bias-variance decomposition. It applies to a broad class of statistical models including Gaussian mixture models, stochastic block models, and latent class models. A theoretical analysis proves the optimality of ATC under the Gaussian mixture model and explicitly quantifies the benefit of transfer. Extensive simulations and real data experiments confirm our method's effectiveness in various scenarios.

Adaptive Transfer Clustering: A Unified Framework

TL;DR

An adaptive transfer clustering algorithm that automatically leverages the commonality in the presence of unknown discrepancy, by optimizing an estimated bias-variance decomposition, applies to a broad class of statistical models including Gaussian mixture models, stochastic block models, and latent class models.

Abstract

We propose a general transfer learning framework for clustering given a main dataset and an auxiliary one about the same subjects. The two datasets may reflect similar but different latent grouping structures of the subjects. We propose an adaptive transfer clustering (ATC) algorithm that automatically leverages the commonality in the presence of unknown discrepancy, by optimizing an estimated bias-variance decomposition. It applies to a broad class of statistical models including Gaussian mixture models, stochastic block models, and latent class models. A theoretical analysis proves the optimality of ATC under the Gaussian mixture model and explicitly quantifies the benefit of transfer. Extensive simulations and real data experiments confirm our method's effectiveness in various scenarios.

Paper Structure

This paper contains 39 sections, 23 theorems, 235 equations, 8 figures, 2 tables, 3 algorithms.

Key Result

Proposition 1

For each $i\in[n]$, we have Consequently, we have

Figures (8)

  • Figure 1: Comparison with the hypothesis test's detection boundary. Below the red curve $\alpha=\alpha^*(r)$, i.e., Region ${\cal R}_{1,\sf det}\bigcup{\cal R}_{2,\sf det}$, consistent test exists for \ref{['eq:testing']}; above the red curve, i.e., Region ${\cal R}_{\sf undet}$, there is no consistent test for \ref{['eq:testing']}. Region ${\cal R}_{\sf undet}$: transfer clustering performs as well as data pooling when no consistent test can distinguish $\mathbf{Z}_0^*$ and $\mathbf{Z}_1^*$. Region ${\cal R}_{1,\sf det}$: transfer clustering beats data pooling while we can have a consistent test to distinguish $\mathbf{Z}_0^*$ and $\mathbf{Z}_1^*$. Region ${\cal R}_{2,\sf det}$: transfer clustering performs as well as data pooling while we can have a consistent test to distinguish $\mathbf{Z}_0^*$ and $\mathbf{Z}_1^*$.
  • Figure 2: Clustering error v.s. discrepancy $\varepsilon$. (Left) Target: GMM; Source: GMM. Parameters are set to be $(n,K,d,\upsilon)=(500, 2, 10, 0.54)$. (Right) Target: SBM; Source: GMM. Parameters are set to be $(n,K,d,p,q,\upsilon)=(300, 2, 10, 0.4, 0.3, 0.79)$. Each error point represents the average of 50 replications.
  • Figure 3: Clustering error v.s. discrepancy $\varepsilon$. (Left) Target: GMM; Source: SBM. Parameters are set to be $(n,K,p,q,d,\upsilon)=(300, 2, 0.5, 0.3, 10, 0.75)$. (Right) Target: LCM; Source: GMM. Parameters are set to be $(n,K,d,\Delta, \upsilon)=(200, 2, 10, 0.1, 0.76)$. Each error point represents the average of 50 replications.
  • Figure 4: Clustering error v.s. discrepancy $\varepsilon$. Target: LCM; Source: LCM. Parameters are set to be $(n,K,d,\Delta)=(300, 2, 15, 0.1)$. Each error point represents the average of 50 replications.
  • Figure 5: (Clustering error v.s. penalty $\lambda$ for Lawyers Network) Target: years with the firm (GMM); Source: strong-coworker network (SBM).
  • ...and 3 more figures

Theorems & Definitions (34)

  • Example 2.1: Independent task learning
  • Example 2.2: Data pooling
  • Proposition 1
  • Theorem 1
  • Theorem 2
  • Remark 1: Comparison with testing
  • Theorem 3: Lower bound
  • Lemma 1
  • Theorem 4
  • Corollary 1
  • ...and 24 more