Table of Contents
Fetching ...

Highly Adaptive Ridge

Alejandro Schuler, Alexander Hagemeister, Mark van der Laan

TL;DR

A regression method that achieves a dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives and demonstrates empirical performance better than state-of-the-art algorithms for small datasets in particular.

Abstract

In this paper we propose the Highly Adaptive Ridge (HAR): a regression method that achieves a $n^{-1/3}$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives. This is a large nonparametric function class that is particularly appropriate for tabular data. HAR is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion. We use simulation and real data to confirm our theory. We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.

Highly Adaptive Ridge

TL;DR

A regression method that achieves a dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives and demonstrates empirical performance better than state-of-the-art algorithms for small datasets in particular.

Abstract

In this paper we propose the Highly Adaptive Ridge (HAR): a regression method that achieves a dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives. This is a large nonparametric function class that is particularly appropriate for tabular data. HAR is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion. We use simulation and real data to confirm our theory. We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
Paper Structure (27 sections, 8 theorems, 43 equations, 2 figures, 2 tables)

This paper contains 27 sections, 8 theorems, 43 equations, 2 figures, 2 tables.

Key Result

Theorem 1

Define the "truth" $f = \mathop{\mathrm{arg\,min}}\limits_{\{g:[0,1]^p \to \mathds R\}} \mathop{\mathrm{\mathbb P}}\limits Lg$ for a loss function $L$. Let our model be $\mathscr F_n(M_n) = \{H(x)^\top\beta : \|\beta\|^2 \le M_{n}\}$ and our estimate be $\hat{f}_n = \mathop{\mathrm{arg\,min}}\limit

Figures (2)

  • Figure 1: Fits of HAR and other methods on simple one-dimensional data.
  • Figure 2: Convergence of HAR relative to theorized rate.

Theorems & Definitions (19)

  • Theorem 1
  • Example 1: Squared Error
  • proof
  • Example 2: Logistic Regression
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 2
  • ...and 9 more