Table of Contents
Fetching ...

Near-optimal delta-convex estimation of Lipschitz functions

Gábor Balázs

TL;DR

The paper tackles nonparametric regression for Lipschitz functions under random design, introducing delta-convex fitting (DCF) to approximate Lipschitz targets via a nonlinear feature expansion of max-affine form. It combines adaptive center selection (AFPC), convex empirical risk minimization over a DC function class, and a two-stage refinement to produce a near-minimax estimator with respect to the intrinsic dimension $d_*$, without knowing the true Lipschitz constant $\lambda_*$. The authors establish PAC guarantees and a near-minimax convergence rate up to polylog factors under subgaussian covariates and noise, and they show how DCF readily adapts to convex shape-restricted regression. Empirically, DCF achieves competitive performance against both theory-grounded baselines and modern tree-based methods, while providing a tractable, scalable framework for Lipschitz-function estimation in high-dimensional settings.

Abstract

This paper presents a tractable algorithm for estimating an unknown Lipschitz function from noisy observations and establishes an upper bound on its convergence rate. The approach extends max-affine methods from convex shape-restricted regression to the more general Lipschitz setting. A key component is a nonlinear feature expansion that maps max-affine functions into a subclass of delta-convex functions, which act as universal approximators of Lipschitz functions while preserving their Lipschitz constants. Leveraging this property, the estimator attains the minimax convergence rate (up to logarithmic factors) with respect to the intrinsic dimension of the data under squared loss and subgaussian distributions in the random design setting. The algorithm integrates adaptive partitioning to capture intrinsic dimension, a penalty-based regularization mechanism that removes the need to know the true Lipschitz constant, and a two-stage optimization procedure combining a convex initialization with local refinement. The framework is also straightforward to adapt to convex shape-restricted regression. Experiments demonstrate competitive performance relative to other theoretically justified methods, including nearest-neighbor and kernel-based regressors.

Near-optimal delta-convex estimation of Lipschitz functions

TL;DR

The paper tackles nonparametric regression for Lipschitz functions under random design, introducing delta-convex fitting (DCF) to approximate Lipschitz targets via a nonlinear feature expansion of max-affine form. It combines adaptive center selection (AFPC), convex empirical risk minimization over a DC function class, and a two-stage refinement to produce a near-minimax estimator with respect to the intrinsic dimension , without knowing the true Lipschitz constant . The authors establish PAC guarantees and a near-minimax convergence rate up to polylog factors under subgaussian covariates and noise, and they show how DCF readily adapts to convex shape-restricted regression. Empirically, DCF achieves competitive performance against both theory-grounded baselines and modern tree-based methods, while providing a tractable, scalable framework for Lipschitz-function estimation in high-dimensional settings.

Abstract

This paper presents a tractable algorithm for estimating an unknown Lipschitz function from noisy observations and establishes an upper bound on its convergence rate. The approach extends max-affine methods from convex shape-restricted regression to the more general Lipschitz setting. A key component is a nonlinear feature expansion that maps max-affine functions into a subclass of delta-convex functions, which act as universal approximators of Lipschitz functions while preserving their Lipschitz constants. Leveraging this property, the estimator attains the minimax convergence rate (up to logarithmic factors) with respect to the intrinsic dimension of the data under squared loss and subgaussian distributions in the random design setting. The algorithm integrates adaptive partitioning to capture intrinsic dimension, a penalty-based regularization mechanism that removes the need to know the true Lipschitz constant, and a two-stage optimization procedure combining a convex initialization with local refinement. The framework is also straightforward to adapt to convex shape-restricted regression. Experiments demonstrate competitive performance relative to other theoretically justified methods, including nearest-neighbor and kernel-based regressors.

Paper Structure

This paper contains 22 sections, 28 theorems, 75 equations, 7 figures, 2 algorithms.

Key Result

Theorem 1

Consider the estimation problem eq:data-model, where the $n$ i.i.d. samples ${\mathcal{D}_n}$ are drawn from an unknown distribution $P_*$ over $\mathcal{X}_* \times \mathbb{R}$, and the regression function $f_*$ is $\lambda_*$-Lipschitz over $\mathcal{X}_*$ w.r.t. $\lVert\cdot\rVert$. Suppose the c Let $\triangleright \in\{1,2,\infty,+\}$, and $f_n^+$ be the DCF estimator computed by alg:DCF usin

Figures (7)

  • Figure 1: AFPC partition size ($K$) for sample sizes $n \in \{1024, 2048, 4096\}$, and average cell size distribution for $n = 4096$. The upper bound of $K$ is $n^{d/(2+d)}$. The black vertical lines on the average cell size axes mark the value of $d$. The plots for pumadyn-8nh are similar to those of pumadyn-8nm and are omitted for brevity.
  • Figure 2: Test MSEs of the estimators trained on samples sizes $n \in \{1024, 2048, 4096\}$. The performance is very similar across all estimators for both MM and STD scalings of the pumadyn datasets; therefore, the plots for the latter are omitted for brevity.
  • Figure 3: Training and prediction times (in seconds and milliseconds, respectively) are shown for the pumadyn-8nm dataset with MM scaling in the left and center panels. Prediction times are measured on the entire test set (whose size varies with $n$) and normalized to $1000$ samples. The right panel shows the number of parameters used by the initial and final DCF estimators, $f_n$ and $f_n^+$, respectively.
  • Figure 4: Test MSEs, using the same notations as above, where all DCF models are trained with the weaker regularizer $\theta_2 = (R_{\mathcal{X}_n}/n)^2$.
  • Figure 5: Approximation of a function $f \in \mathcal{F}_{\lambda,\mathcal{X}}$ by the max-concave $\hat{f}$ and the min-convex $\check{f}$ as defined above. The left two plots use $f(x) = x\sin(x)$, while the right two plots use $f(x) = \max\{1-|x-1|,2-|x-3|,1-|x-5|/2\}$, both over $\mathcal{X} = [0,6]$. The shaded regions represent $\lambda\epsilon$ and $2\lambda\epsilon$ bounds around $f$. Black circles mark the locations of the $10$ equidistant centers $\mathcal{X}_\epsilon$, forming an $\epsilon$-cover of $\mathcal{X}$ with $\epsilon = 1/3$. The horizontal line is shown at the height of zero. FVU (fraction of variance unexplained) is calculated over $n = 1000$ equidistant points ${\boldsymbol{x}}_1,\ldots,{\boldsymbol{x}}_n \in \mathcal{X}$ with $y_i = f({\boldsymbol{x}}_i)$ as: $\textrm{FVU}(\hat{f}) \doteq \sum_{i\in[n]}(y_i-\hat{f}({\boldsymbol{x}}_i))^2/\sum_{i\in[n]}(y_i-\bar{y})^2$, where $\bar{y} \doteq (1/n)\sum_{i\in[n]}y_i$.
  • ...and 2 more figures

Theorems & Definitions (28)

  • Theorem 1
  • Theorem 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Lemma 7: e.g., Wainwright2019, Lemma 5.7
  • Lemma 8: Balazs2022, Lemma 4.2
  • Lemma 9
  • Lemma 10
  • ...and 18 more