Table of Contents
Fetching ...

Diffusion-based Semi-supervised Spectral Algorithm for Regression on Manifolds

Weichun Xia, Jiaxin Jiang, Lei Shi

TL;DR

A novel diffusion-based spectral algorithm to tackle regression analysis on high-dimensional data, particularly data embedded within lower-dimensional manifolds, and achieves a convergence rate that depends solely on the intrinsic dimension of the underlying manifold, thereby avoiding the curse of dimensionality associated with the higher ambient dimension.

Abstract

We introduce a novel diffusion-based spectral algorithm to tackle regression analysis on high-dimensional data, particularly data embedded within lower-dimensional manifolds. Traditional spectral algorithms often fall short in such contexts, primarily due to the reliance on predetermined kernel functions, which inadequately address the complex structures inherent in manifold-based data. By employing graph Laplacian approximation, our method uses the local estimation property of heat kernel, offering an adaptive, data-driven approach to overcome this obstacle. Another distinct advantage of our algorithm lies in its semi-supervised learning framework, enabling it to fully use the additional unlabeled data. This ability enhances the performance by allowing the algorithm to dig the spectrum and curvature of the data manifold, providing a more comprehensive understanding of the dataset. Moreover, our algorithm performs in an entirely data-driven manner, operating directly within the intrinsic manifold structure of the data, without requiring any predefined manifold information. We provide a convergence analysis of our algorithm. Our findings reveal that the algorithm achieves a convergence rate that depends solely on the intrinsic dimension of the underlying manifold, thereby avoiding the curse of dimensionality associated with the higher ambient dimension.

Diffusion-based Semi-supervised Spectral Algorithm for Regression on Manifolds

TL;DR

A novel diffusion-based spectral algorithm to tackle regression analysis on high-dimensional data, particularly data embedded within lower-dimensional manifolds, and achieves a convergence rate that depends solely on the intrinsic dimension of the underlying manifold, thereby avoiding the curse of dimensionality associated with the higher ambient dimension.

Abstract

We introduce a novel diffusion-based spectral algorithm to tackle regression analysis on high-dimensional data, particularly data embedded within lower-dimensional manifolds. Traditional spectral algorithms often fall short in such contexts, primarily due to the reliance on predetermined kernel functions, which inadequately address the complex structures inherent in manifold-based data. By employing graph Laplacian approximation, our method uses the local estimation property of heat kernel, offering an adaptive, data-driven approach to overcome this obstacle. Another distinct advantage of our algorithm lies in its semi-supervised learning framework, enabling it to fully use the additional unlabeled data. This ability enhances the performance by allowing the algorithm to dig the spectrum and curvature of the data manifold, providing a more comprehensive understanding of the dataset. Moreover, our algorithm performs in an entirely data-driven manner, operating directly within the intrinsic manifold structure of the data, without requiring any predefined manifold information. We provide a convergence analysis of our algorithm. Our findings reveal that the algorithm achieves a convergence rate that depends solely on the intrinsic dimension of the underlying manifold, thereby avoiding the curse of dimensionality associated with the higher ambient dimension.

Paper Structure

This paper contains 22 sections, 13 theorems, 262 equations, 2 figures, 1 algorithm.

Key Result

Lemma 1

Suppose that $\mathcal{M}$ is a compact, connected Riemannian manifold with dimension $d$. Then, for the eigensystem of Laplacian $\Delta$ on $\mathcal{M}$ as in eigen-system of Laplacian, for any $k\in\mathbb{N}$, we have the following estimations: where $C_{low}$, $C_{up}$, and $D_1$ are absolute constants only rely on $\mathcal{M}$.

Figures (2)

  • Figure 1: Results for regression function $f^*(\theta)=20\sin(\theta)+24\cos(\theta)$ on $S^2$ over a labeled dataset with $140$ data points and an unlabeled dataset with $1460$ data points. Figure \ref{['Fig1.krr']} illustrates the outcomes of kernel ridge regression, while Figure \ref{['Fig1.kpcr']} displays the results for kernel principal component regularization, and Figure \ref{['Fig1.gf']} showcases the application of gradient flow. In each depiction, the actual values of $f^*$ are represented by a continuous blue curve, contrasting with the orange scatter points that signify the algorithm's predictions.
  • Figure 2: Results for regression function $f^*(\theta,\phi)=20\sin(\theta)\phi$ on $S^2$ over the same dataset with $140$ labeled points and $1460$ unlabeled points. Figure \ref{['Fig2.real']} visualizes the true distribution of $f^*$, serving as a benchmark for comparison. Figures \ref{['Fig2.krr']}, \ref{['Fig2.kpcr']}, and \ref{['Fig2.gf']} respectively illustrate the predictions made by kernel ridge regression, kernel principal component regularization, and gradient flow. In these figures, the color assigned to each sample point denotes its response value.

Theorems & Definitions (26)

  • Lemma 1
  • Example 1
  • Example 2
  • Example 3
  • Theorem 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Theorem 3
  • Lemma 2
  • ...and 16 more