Table of Contents
Fetching ...

Self-Regularized Learning Methods

Max Schölpple, Liu Fanghui, Ingo Steinwart

Abstract

We introduce a general framework for analyzing learning algorithms based on the notion of self-regularization, which captures implicit complexity control without requiring explicit regularization. This is motivated by previous observations that many algorithms, such as gradient-descent based learning, exhibit implicit regularization. In a nutshell, for a self-regularized algorithm the complexity of the predictor is inherently controlled by that of the simplest comparator achieving the same empirical risk. This framework is sufficiently rich to cover both classical regularized empirical risk minimization and gradient descent. Building on self-regularization, we provide a thorough statistical analysis of such algorithms including minmax-optimal rates, where it suffices to show that the algorithm is self-regularized -- all further requirements stem from the learning problem itself. Finally, we discuss the problem of data-dependent hyperparameter selection, providing a general result which yields minmax-optimal rates up to a double logarithmic factor and covers data-driven early stopping for RKHS-based gradient descent.

Self-Regularized Learning Methods

Abstract

We introduce a general framework for analyzing learning algorithms based on the notion of self-regularization, which captures implicit complexity control without requiring explicit regularization. This is motivated by previous observations that many algorithms, such as gradient-descent based learning, exhibit implicit regularization. In a nutshell, for a self-regularized algorithm the complexity of the predictor is inherently controlled by that of the simplest comparator achieving the same empirical risk. This framework is sufficiently rich to cover both classical regularized empirical risk minimization and gradient descent. Building on self-regularization, we provide a thorough statistical analysis of such algorithms including minmax-optimal rates, where it suffices to show that the algorithm is self-regularized -- all further requirements stem from the learning problem itself. Finally, we discuss the problem of data-dependent hyperparameter selection, providing a general result which yields minmax-optimal rates up to a double logarithmic factor and covers data-driven early stopping for RKHS-based gradient descent.
Paper Structure (14 sections, 20 theorems, 234 equations)

This paper contains 14 sections, 20 theorems, 234 equations.

Key Result

Theorem 4

Let $H$ be an RKHS and let the loss function $L$ be convex and $M$-smooth. Define $M' \coloneqq M \|H\hookrightarrow \mathcal{L}_{\infty}(X)\|^2$. Let the step sizes $(\eta_k)_{k\in\mathbb{N}_0}$ fulfill $\eta_k \le 1/M'$ for all $k\in\mathbb{N}_0$ and $\sum_{k=0}^{\infty} \eta_k = \infty$, and let

Theorems & Definitions (44)

  • Definition 1: Self-regularized learning
  • Example 2
  • Definition 3: $M$-smooth functional on RKHS
  • Theorem 4: Gradient descent in RKHS is self-regularized
  • Theorem 5
  • Theorem 6
  • Definition 7
  • Theorem 8: Abstract cross-validation
  • Theorem 9
  • proof : Proof of \ref{['ex:rerm-is--self-reg']}
  • ...and 34 more