Table of Contents
Fetching ...

Learning on manifolds without manifold learning

H. N. Mhaskar, Ryan O'Dowd

TL;DR

The paper tackles the problem of learning a function from data drawn on an unknown low-dimensional manifold by avoiding explicit manifold learning and instead projecting the data onto a hypersphere. It introduces a one-shot approximation using a localized spherical polynomial kernel and an empirical operator $F_n(\mathcal{D};x)$, achieving dimension-dependent rates with sample complexity $M\gtrsim n^{q+2\gamma}\log(n/\delta)$ for target smoothness $\gamma$ on a $q$-dimensional manifold. The method relies only on the manifold dimension $q$, not the ambient dimension, and provides an explicit integral reconstruction framework via tangential lifting and Bernstein concentration for discretization. Numerical experiments in magnetic resonance relaxometry and Darcy flow illustrate robustness to noise and applicability to inverse problems.

Abstract

Function approximation based on data drawn randomly from an unknown distribution is an important problem in machine learning. The manifold hypothesis assumes that the data is sampled from an unknown submanifold of a high dimensional Euclidean space. A great deal of research deals with obtaining information about this manifold, such as the eigendecomposition of the Laplace-Beltrami operator or coordinate charts, and using this information for function approximation. This two-step approach implies some extra errors in the approximation stemming from estimating the basic quantities of the data manifold in addition to the errors inherent in function approximation. In this paper, we project the unknown manifold as a submanifold of an ambient hypersphere and study the question of constructing a one-shot approximation using a specially designed sequence of localized spherical polynomial kernels on the hypersphere. Our approach does not require preprocessing of the data to obtain information about the manifold other than its dimension. We give optimal rates of approximation for relatively ``rough'' functions.

Learning on manifolds without manifold learning

TL;DR

The paper tackles the problem of learning a function from data drawn on an unknown low-dimensional manifold by avoiding explicit manifold learning and instead projecting the data onto a hypersphere. It introduces a one-shot approximation using a localized spherical polynomial kernel and an empirical operator $F_n(\mathcal{D};x)$, achieving dimension-dependent rates with sample complexity $M\gtrsim n^{q+2\gamma}\log(n/\delta)$ for target smoothness $\gamma$ on a $q$-dimensional manifold. The method relies only on the manifold dimension $q$, not the ambient dimension, and provides an explicit integral reconstruction framework via tangential lifting and Bernstein concentration for discretization. Numerical experiments in magnetic resonance relaxometry and Darcy flow illustrate robustness to noise and applicability to inverse problems.

Abstract

Function approximation based on data drawn randomly from an unknown distribution is an important problem in machine learning. The manifold hypothesis assumes that the data is sampled from an unknown submanifold of a high dimensional Euclidean space. A great deal of research deals with obtaining information about this manifold, such as the eigendecomposition of the Laplace-Beltrami operator or coordinate charts, and using this information for function approximation. This two-step approach implies some extra errors in the approximation stemming from estimating the basic quantities of the data manifold in addition to the errors inherent in function approximation. In this paper, we project the unknown manifold as a submanifold of an ambient hypersphere and study the question of constructing a one-shot approximation using a specially designed sequence of localized spherical polynomial kernels on the hypersphere. Our approach does not require preprocessing of the data to obtain information about the manifold other than its dimension. We give optimal rates of approximation for relatively ``rough'' functions.
Paper Structure (20 sections, 16 theorems, 123 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 16 theorems, 123 equations, 10 figures, 1 table, 1 algorithm.

Key Result

Theorem 1.1

(Informal statement) Let $\mathcal{D}=\{(y_j,z_j)\}_{j=1}^M$ be a set of random samples chosen from a distribution $\tau$. Suppose $f$ belongs to a smoothness class $W_\gamma$ (detailed in Definition def:manifold_smoothness) with associated norm $\left|\left|\circ\right|\right|_{W_\gamma}$. Then und where $c$ is a positive constant independent of $f$.

Figures (10)

  • Figure 1: Error comparison between our method, the Nadaraya-Watson estimator, and an interpolatory RBF network. (Left) Comparison of absolute errors between the methods with the target function plotted on the right $y$-axis for benefit of the viewer. We note that the error from the RBF method is scaled by $10^{-3}$ so as to not dominate the figure. (Right) Percent point plot of the log absolute error for all three methods.
  • Figure 2: Visualization of our approximation approach. Here, $\mathbb{X}$ is a submanifold of the sphere $\mathbb{S}^Q$. The map $\eta_x$, analogous to the exponential map, allows us to relate the part of the integral in \ref{['eq:manifold_summabilityop']} near $x$ with an integral on the tangent sphere at $x$ via a change of variables (solid curves). The localization of the kernels in our method allow for the approximation to be extended over $\mathbb{X}$ and the tangent sphere $\mathbb{S}_x$ (dotted curves).
  • Figure 3: Left y-axis: Plot of the true function $f$ compared with $F_{32}$ constructed by $2^{13}$ noiseless training points. Right y-axis: Plot of $\left|f-F_{32}\right|$.
  • Figure 4: (Left) Percent point plot of log absolute error for various $n$ with $M=2^{13}$ training points and no noise. (Center) Percent point plot of log absolute error for various choices of $M$ with no noise. (Right) Percent point plot of log absolute error for various noise levels with $M=2^{13}$ training points.
  • Figure 5: (Left) Percent point plot of log combined error for various $n$ with $M=2^{13}$ training points, and no noise. (Center) Percent point plot of log combined error for fixed $n=32$, various choices of $M$, and no noise. (Right) Percent point plot of log combined error for fixed $n=32$, fixed $M=2^{13}$ training points, and various noise levels.
  • ...and 5 more figures

Theorems & Definitions (33)

  • Theorem 1.1
  • Example 2.1
  • Remark 4.1
  • Proposition 4.1
  • proof
  • Proposition 4.2
  • Remark 4.2
  • Theorem 4.1
  • proof
  • Definition 5.1
  • ...and 23 more