Highly Adaptive Ridge

Alejandro Schuler; Alexander Hagemeister; Mark van der Laan

Highly Adaptive Ridge

Alejandro Schuler, Alexander Hagemeister, Mark van der Laan

TL;DR

A regression method that achieves a dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives and demonstrates empirical performance better than state-of-the-art algorithms for small datasets in particular.

Abstract

In this paper we propose the Highly Adaptive Ridge (HAR): a regression method that achieves a $n^{-1/3}$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives. This is a large nonparametric function class that is particularly appropriate for tabular data. HAR is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion. We use simulation and real data to confirm our theory. We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.

Highly Adaptive Ridge

TL;DR

Abstract

In this paper we propose the Highly Adaptive Ridge (HAR): a regression method that achieves a

dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives. This is a large nonparametric function class that is particularly appropriate for tabular data. HAR is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion. We use simulation and real data to confirm our theory. We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.

Paper Structure (27 sections, 8 theorems, 43 equations, 2 figures, 2 tables)

This paper contains 27 sections, 8 theorems, 43 equations, 2 figures, 2 tables.

Introduction
Notation and Preliminaries
Motivation for This Function Class
Method
Convergence Rate
Computation
Higher-Order HAR
Related Work
Demonstration
Convergence Rate in Simulation
Empirical Performance
Discussion
Proof of Rate Result
Loss Assumptions
Oracle Approximation
...and 12 more sections

Key Result

Theorem 1

Define the "truth" $f = \mathop{\mathrm{arg\,min}}\limits_{\{g:[0,1]^p \to \mathds R\}} \mathop{\mathrm{\mathbb P}}\limits Lg$ for a loss function $L$. Let our model be $\mathscr F_n(M_n) = \{H(x)^\top\beta : \|\beta\|^2 \le M_{n}\}$ and our estimate be $\hat{f}_n = \mathop{\mathrm{arg\,min}}\limit

Figures (2)

Figure 1: Fits of HAR and other methods on simple one-dimensional data.
Figure 2: Convergence of HAR relative to theorized rate.

Theorems & Definitions (19)

Theorem 1
Example 1: Squared Error
proof
Example 2: Logistic Regression
proof
Lemma 1
proof
Lemma 2
proof
Theorem 2
...and 9 more

Highly Adaptive Ridge

TL;DR

Abstract

Highly Adaptive Ridge

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (19)