Table of Contents
Fetching ...

Conditional regression for the Nonlinear Single-Variable Model

Yantao Wu, Mauro Maggioni

TL;DR

The paper introduces the Nonlinear Single-Variable Model (NSVM) for high-dimensional regression, where the regression function is F(X)=f(Π_γ X) with an unknown curved projection γ and a one-dimensional link f. It develops a nonparametric estimator based on conditional (inverse) regression that recovers the geometry of γ, the nonlinear projection Π_γ, and the outer function f, all while avoiding the curse of dimensionality; the estimator runs in near-linear time in the sample size and achieves a near-minimax rate for one-dimensional regression, up to log factors, with a controllable curve-approximation error. The theoretical analysis provides concentration bounds for slice-based parameter estimates, distance-based slice assignment accuracy, and a detailed MSE decomposition, culminating in guarantees that the overall error matches the 1D rate under mild, dimension-free assumptions on f and γ. Numerical experiments on circular arcs and Meyer helix curves, plus an application to reaction-path committor learning in Langevin dynamics, demonstrate robustness to curvature and dimensionality, and illustrate the estimator’s interpretability via local tangent estimates to γ. The work suggests that, by exploiting nonlinear compositional structure with a one-dimensional manifold, one can effectively learn high-dimensional functions without incurring exponential dependence on ambient dimension, with potential extensions to higher-dimensional manifolds and non-monotone link functions.

Abstract

Regressing a function $F$ on $\mathbb{R}^d$ without the statistical and computational curse of dimensionality requires special statistical models, for example that impose geometric assumptions on the distribution of the data (e.g., that its support is low-dimensional), or strong smoothness assumptions on $F$, or a special structure $F$. Among the latter, compositional models $F=f\circ g$ with $g$ mapping to $\mathbb{R}^r$ with $r\ll d$ include classical single- and multi-index models, as well as neural networks. While the case where $g$ is linear is well-understood, less is known when $g$ is nonlinear, and in particular for which $g$'s the curse of dimensionality in estimating $F$, or both $f$ and $g$, may be circumvented. Here we consider a model $F(X):=f(Π_γX)$ where $Π_γ:\mathbb{R}^d\to[0,\textrm{len}_γ]$ is the closest-point projection onto the parameter of a regular curve $γ:[0, \textrm{len}_γ]\to\mathbb{R}^d$, and $f:[0,\textrm{len}_γ]\to \mathbb{R}^1$. The input data $X$ is not low-dimensional: it can be as far from $γ$ as the condition that $Π_γ(X)$ is well-defined allows. The distribution $X$, the curve $γ$ and the function $f$ are all unknown. This model is a natural nonlinear generalization of the single-index model, corresponding to $γ$ being a line. We propose a nonparametric estimator, based on conditional regression, that under suitable assumptions, the strongest of which being that $f$ is coarsely monotone, achieves, up to log factors, the $\textit{one-dimensional}$ optimal min-max rate for non-parametric regression, up to the level of noise in the observations, and be constructed in time $\mathcal{O}(d^2 n\log n)$. All the constants in the learning bounds, in the minimal number of samples required for our bounds to hold, and in the computational complexity are at most low-order polynomials in $d$.

Conditional regression for the Nonlinear Single-Variable Model

TL;DR

The paper introduces the Nonlinear Single-Variable Model (NSVM) for high-dimensional regression, where the regression function is F(X)=f(Π_γ X) with an unknown curved projection γ and a one-dimensional link f. It develops a nonparametric estimator based on conditional (inverse) regression that recovers the geometry of γ, the nonlinear projection Π_γ, and the outer function f, all while avoiding the curse of dimensionality; the estimator runs in near-linear time in the sample size and achieves a near-minimax rate for one-dimensional regression, up to log factors, with a controllable curve-approximation error. The theoretical analysis provides concentration bounds for slice-based parameter estimates, distance-based slice assignment accuracy, and a detailed MSE decomposition, culminating in guarantees that the overall error matches the 1D rate under mild, dimension-free assumptions on f and γ. Numerical experiments on circular arcs and Meyer helix curves, plus an application to reaction-path committor learning in Langevin dynamics, demonstrate robustness to curvature and dimensionality, and illustrate the estimator’s interpretability via local tangent estimates to γ. The work suggests that, by exploiting nonlinear compositional structure with a one-dimensional manifold, one can effectively learn high-dimensional functions without incurring exponential dependence on ambient dimension, with potential extensions to higher-dimensional manifolds and non-monotone link functions.

Abstract

Regressing a function on without the statistical and computational curse of dimensionality requires special statistical models, for example that impose geometric assumptions on the distribution of the data (e.g., that its support is low-dimensional), or strong smoothness assumptions on , or a special structure . Among the latter, compositional models with mapping to with include classical single- and multi-index models, as well as neural networks. While the case where is linear is well-understood, less is known when is nonlinear, and in particular for which 's the curse of dimensionality in estimating , or both and , may be circumvented. Here we consider a model where is the closest-point projection onto the parameter of a regular curve , and . The input data is not low-dimensional: it can be as far from as the condition that is well-defined allows. The distribution , the curve and the function are all unknown. This model is a natural nonlinear generalization of the single-index model, corresponding to being a line. We propose a nonparametric estimator, based on conditional regression, that under suitable assumptions, the strongest of which being that is coarsely monotone, achieves, up to log factors, the optimal min-max rate for non-parametric regression, up to the level of noise in the observations, and be constructed in time . All the constants in the learning bounds, in the minimal number of samples required for our bounds to hold, and in the computational complexity are at most low-order polynomials in .

Paper Structure

This paper contains 35 sections, 21 theorems, 138 equations, 11 figures, 2 tables, 1 algorithm.

Key Result

Theorem 2

Suppose that $f\in\mathcal{C}^s(\mathbb{R}^1)$ for some $s\in[\frac{1}{2},2]$ and that $f$ is coarsely monotone. With some assumptions on the underlying curve $\gamma$, the distribution $\rho_X$ of the random variable $X$, and the variance $\sigma_\zeta^2$ of the noise $\zeta$, if the number of trai The dependency of the constants $C_1,C_2$ on $d$ is a low-order polynomial. The estimator ${\wideha

Figures (11)

  • Figure 1: One example of a Nonlinear Single-Variable Model \ref{['def:NSVM']}: the underlying curve$\gamma$, plotted in black, is a Meyer helix in $\mathbb{R}^{36}$ (details in Appendix \ref{['Appendix: Meyer-Helix']}) with $\sigma_\gamma=0.5$ and the link function$f\in\mathcal{C}^{0.7}(\mathbb{R}^1)$ is strictly monotone, and $\zeta\equiv 0$. We generate $n=5000$ samples $X_i$ scattered near the curve in a tube of radius 6, colored by $Y_i= F(X_i) = f(\Pi_\gamma X_i)$. Left: Random projection of the data onto $\mathbb{R}^3$. Right: Orthogonal projection of data onto the first $3$ principal components. The distribution around this curve does not appear to be linearly embeddable in low dimensions without increasing its complexity, see Appendix \ref{['Appendix: Meyer-Helix']}.
  • Figure 2: In the same setup as in Fig.\ref{['Figure: 36 dimensional Meyer-Helix-L1-C1-variant1']}, we partition the range uniformly into $l=800$ intervals, and consider two slices. Top: a visualization of the two empirical slices, where we only plot 2000 samples per slice (in green and blue), with $\gamma$ in black. The red circles and vectors are the sample means and smallest principal components of the two empirical slices. Bottom: bar plots with the largest, second largest, average, second smallest, and smallest singular value of the empirical slices. The smallest singular value is significantly smaller than the others.
  • Figure 3: In the same setup and visual conventions of Fig.\ref{['Figure: 36 dimensional Meyer-Helix-L1-C1-variant1']}, but with the range $R$ split uniformly into 80 intervals, and a different pair of slices. Top: the slices are now elongated along the curve, rather than perpendicularly to it. Bottom: the largest singular value is now significantly larger than the remaining ones.
  • Figure 4: Numerical Tests in Section \ref{['Example 1']}: Circular arcs embedded in $\mathbb{R}^d$, $d=20$, with unit length and curvature varying in $[0.04,0.4]$. We fix the noise level $\sigma_\zeta=0.03$. Upper row: MSE for ${\widehat{F}}$ (left) and MSE at $n=2\times10^6$ as a function of curvature (right); Bottom row: estimation error for the center along tangential direction (left) and difference between estimated significant vector and the tangential direction (right) over $5$ runs.
  • Figure 5: Numerical Tests in Section \ref{['Example 2.1']}: Meyer helix in $d=7$ dimensions, with $f$ of smoothness exponent $s=2$, and noise level $\sigma_\zeta$ varying in $[0.05,0.2]$. Top row: MSE for ${\widehat{F}}$ (left) and MSE at $n=2\times10^6$ as a function of $\sigma_\zeta$ (right); Bottom row: estimation error of center along tangential direction (left) and difference between estimated significant vector and tangential direction (right), over five independent runs.
  • ...and 6 more figures

Theorems & Definitions (22)

  • Definition 1: Nonlinear Single-Variable Model
  • Theorem 2: Informal
  • Theorem 3: MSE of the Estimator constructed by Algorithm \ref{['Alg: NVM']}
  • Theorem 4: MSE of Algorithm \ref{['Alg: NVM']} in the noiseless case
  • Theorem 5: NVM without \ref{['LCV']}
  • Proposition 6
  • Proposition 7
  • Corollary 8
  • Corollary 9
  • Proposition 10: nearest index is almost correct
  • ...and 12 more