Table of Contents
Fetching ...

Global Convergence of Continual Learning on Non-IID Data

Fei Zhu, Yujing Liu, Wenzhuo Liu, Zhaoxiang Zhang

TL;DR

This paper provides a general and comprehensive theoretical analysis for continual learning of regression models by utilizing the stochastic Lyapunov function and martingale estimation techniques and establishes the almost sure convergence results of continual learning under a general data condition for the first time.

Abstract

Continual learning, which aims to learn multiple tasks sequentially, has gained extensive attention. However, most existing work focuses on empirical studies, and the theoretical aspect remains under-explored. Recently, a few investigations have considered the theory of continual learning only for linear regressions, establishes the results based on the strict independent and identically distributed (i.i.d.) assumption and the persistent excitation on the feature data that may be difficult to verify or guarantee in practice. To overcome this fundamental limitation, in this paper, we provide a general and comprehensive theoretical analysis for continual learning of regression models. By utilizing the stochastic Lyapunov function and martingale estimation techniques, we establish the almost sure convergence results of continual learning under a general data condition for the first time. Additionally, without any excitation condition imposed on the data, the convergence rates for the forgetting and regret metrics are provided.

Global Convergence of Continual Learning on Non-IID Data

TL;DR

This paper provides a general and comprehensive theoretical analysis for continual learning of regression models by utilizing the stochastic Lyapunov function and martingale estimation techniques and establishes the almost sure convergence results of continual learning under a general data condition for the first time.

Abstract

Continual learning, which aims to learn multiple tasks sequentially, has gained extensive attention. However, most existing work focuses on empirical studies, and the theoretical aspect remains under-explored. Recently, a few investigations have considered the theory of continual learning only for linear regressions, establishes the results based on the strict independent and identically distributed (i.i.d.) assumption and the persistent excitation on the feature data that may be difficult to verify or guarantee in practice. To overcome this fundamental limitation, in this paper, we provide a general and comprehensive theoretical analysis for continual learning of regression models. By utilizing the stochastic Lyapunov function and martingale estimation techniques, we establish the almost sure convergence results of continual learning under a general data condition for the first time. Additionally, without any excitation condition imposed on the data, the convergence rates for the forgetting and regret metrics are provided.

Paper Structure

This paper contains 15 sections, 4 theorems, 27 equations, 2 figures, 2 algorithms.

Key Result

Theorem 1

Under Assumptions assum2-assum1, the estimation error generated by Algorithm alg1 has the following upper bound as $m\to\infty$: where $\widetilde{\bm{w}}_m = \bm{w}_m - \bm{w}^{*}$ and $\lambda_{\min}(m) = \lambda_{\min}\left\{\mathbf{Q}_0 + \sum\limits_{t=1}^{m} \sum\limits_{i=0}^{n_t} \bm{x}_{t,i} \bm{x}_{t,i}^\top\right\}.$

Figures (2)

  • Figure 1: Illustration of continual learning scenarios investigated in this paper. (a) All tasks share the same global minimizer $\bm{w}^*$. (b) A more general setting that relaxes case 1 into the existence of an approximate common global minimizer. Solid line with an arrow denotes the optimization trajectory of CL algorithm. Dashed line denotes naive sequent fine-tuning, which suffers from catastrophic forgetting.
  • Figure 2: Numerical demonstration of Case 2: (a) SGD suffers from catastrophic forgetting when continually learning 100 tasks. (b) Our algorithm can successfully find the approximate common global minimizer in sequential and (c) random learning orders.

Theorems & Definitions (13)

  • Definition 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Theorem 1
  • Remark 5
  • Theorem 2
  • Remark 6
  • Theorem 3
  • ...and 3 more