Self-Regularized Learning Methods

Max Schölpple; Liu Fanghui; Ingo Steinwart

Self-Regularized Learning Methods

Max Schölpple, Liu Fanghui, Ingo Steinwart

Abstract

We introduce a general framework for analyzing learning algorithms based on the notion of self-regularization, which captures implicit complexity control without requiring explicit regularization. This is motivated by previous observations that many algorithms, such as gradient-descent based learning, exhibit implicit regularization. In a nutshell, for a self-regularized algorithm the complexity of the predictor is inherently controlled by that of the simplest comparator achieving the same empirical risk. This framework is sufficiently rich to cover both classical regularized empirical risk minimization and gradient descent. Building on self-regularization, we provide a thorough statistical analysis of such algorithms including minmax-optimal rates, where it suffices to show that the algorithm is self-regularized -- all further requirements stem from the learning problem itself. Finally, we discuss the problem of data-dependent hyperparameter selection, providing a general result which yields minmax-optimal rates up to a double logarithmic factor and covers data-driven early stopping for RKHS-based gradient descent.

Self-Regularized Learning Methods

Abstract

Paper Structure (14 sections, 20 theorems, 234 equations)

This paper contains 14 sections, 20 theorems, 234 equations.

Introduction
Preliminaries
Self-Regularization: Definition and Examples
Statistical analysis: Oracle inequalities
Statistical analysis: Data-dependent parameter choice
Proofs
Proofs for the Section \ref{['sec:self-reg']} -- Self-Regularization
Proofs for Section \ref{['sec:statistical_analysis']} -- Preparations
Proofs for Section \ref{['sec:statistical_analysis']} -- Theorem \ref{['thm:simple-analysis']}
Proofs for Section \ref{['sec:statistical_analysis']} -- Theorem \ref{['thm:learning_rates_tempered_C_minimal_general_form']}
Proofs for \ref{['sec:cross-val_for_gd']}
An Oracle Inequality for RERMs under Minimal Assumptions
A Refined Oracle Inequality for Clipped RERMs
Miscellaneous Material

Key Result

Theorem 4

Let $H$ be an RKHS and let the loss function $L$ be convex and $M$-smooth. Define $M' \coloneqq M \|H\hookrightarrow \mathcal{L}_{\infty}(X)\|^2$. Let the step sizes $(\eta_k)_{k\in\mathbb{N}_0}$ fulfill $\eta_k \le 1/M'$ for all $k\in\mathbb{N}_0$ and $\sum_{k=0}^{\infty} \eta_k = \infty$, and let

Theorems & Definitions (44)

Definition 1: Self-regularized learning
Example 2
Definition 3: $M$-smooth functional on RKHS
Theorem 4: Gradient descent in RKHS is self-regularized
Theorem 5
Theorem 6
Definition 7
Theorem 8: Abstract cross-validation
Theorem 9
proof : Proof of \ref{['ex:rerm-is--self-reg']}
...and 34 more

Self-Regularized Learning Methods

Abstract

Self-Regularized Learning Methods

Authors

Abstract

Table of Contents

Key Result

Theorems & Definitions (44)