Table of Contents
Fetching ...

Understanding Transfer Learning via Mean-field Analysis

Gholamali Aminian, Łukasz Szpruch, Samuel N. Cohen

TL;DR

This work considers two main transfer learning scenarios, $\alpha$-ERM and fine-tuning with the KL-regularized empirical risk minimization and establishes generic conditions under which the generalization error and the population risk convergence rates for these scenarios are studied.

Abstract

We propose a novel framework for exploring generalization errors of transfer learning through the lens of differential calculus on the space of probability measures. In particular, we consider two main transfer learning scenarios, $α$-ERM and fine-tuning with the KL-regularized empirical risk minimization and establish generic conditions under which the generalization error and the population risk convergence rates for these scenarios are studied. Based on our theoretical results, we show the benefits of transfer learning with a one-hidden-layer neural network in the mean-field regime under some suitable integrability and regularity assumptions on the loss and activation functions.

Understanding Transfer Learning via Mean-field Analysis

TL;DR

This work considers two main transfer learning scenarios, -ERM and fine-tuning with the KL-regularized empirical risk minimization and establishes generic conditions under which the generalization error and the population risk convergence rates for these scenarios are studied.

Abstract

We propose a novel framework for exploring generalization errors of transfer learning through the lens of differential calculus on the space of probability measures. In particular, we consider two main transfer learning scenarios, -ERM and fine-tuning with the KL-regularized empirical risk minimization and establish generic conditions under which the generalization error and the population risk convergence rates for these scenarios are studied. Based on our theoretical results, we show the benefits of transfer learning with a one-hidden-layer neural network in the mean-field regime under some suitable integrability and regularity assumptions on the loss and activation functions.

Paper Structure

This paper contains 32 sections, 24 theorems, 171 equations, 3 figures, 2 tables.

Key Result

Lemma 1

Consider a generic loss function $(m,z)\mapsto \ell(m,z)$, and $(\nu_{n_t,(1)}^t,\nu_{n_s}^s)$ as defined in Eq: nu replace one. The WTGE eq:WTGE is given by,

Figures (3)

  • Figure 1: Overview of Transfer Learning Results
  • Figure 2: Fine-tuning Scenario
  • Figure 3:

Theorems & Definitions (51)

  • Definition 1
  • Remark 1
  • Lemma 1
  • Theorem 1
  • Theorem 2: $\alpha$-ERM
  • Theorem 3: Fine-tuning
  • Remark 2: Convergence rate
  • Definition 2
  • Remark 3
  • Theorem 4: WTGE of the $\alpha$-ERM
  • ...and 41 more