Table of Contents
Fetching ...

Bi-stochastically normalized graph Laplacian: convergence to manifold Laplacian and robustness to outlier noise

Xiuyuan Cheng, Boris Landa

TL;DR

This work establishes that bi-stochastically normalized graph Laplacians, computed via a constrained Sinkhorn–Knopp procedure, converge pointwise to the weighted manifold Laplacian $\Delta_p$ for data sampled i.i.d. from a compact $d$-dimensional manifold, with rates matching those of traditional normalizations. It introduces an approximate, constrained matrix scaling formulation with early termination, proving 2-norm convergence under finite-sample, non-asymptotic conditions and deriving explicit scaling of the kernel bandwidth $\epsilon$ with sample size $n$ to optimize the rate. The paper further extends the theory to data corrupted by outlier noise, proving that the bi-stochastic Laplacian remains robust under a bounded-noise regime and providing a practical SK-based algorithm for noisy data. Numerical experiments on clean and noisy manifolds corroborate the theory, demonstrating accuracy comparable to diffusion-map Laplacians on clean data and superior robustness to high-dimensional outliers, with practical benefits from early-stopped SK iterations.

Abstract

Bi-stochastic normalization provides an alternative normalization of graph Laplacians in graph-based data analysis and can be computed efficiently by Sinkhorn-Knopp (SK) iterations. This paper proves the convergence of bi-stochastically normalized graph Laplacian to manifold (weighted-)Laplacian with rates, when $n$ data points are i.i.d. sampled from a general $d$-dimensional manifold embedded in a possibly high-dimensional space. Under certain joint limit of $n \to \infty$ and kernel bandwidth $ε\to 0$, the point-wise convergence rate of the graph Laplacian operator (under 2-norm) is proved to be $ O( n^{-1/(d/2+3)})$ at finite large $n$ up to log factors, achieved at the scaling of $ε\sim n^{-1/(d/2+3)} $. When the manifold data are corrupted by outlier noise, we theoretically prove the graph Laplacian point-wise consistency which matches the rate for clean manifold data plus an additional term proportional to the boundedness of the inner-products of the noise vectors among themselves and with data vectors. Motivated by our analysis, which suggests that not exact bi-stochastic normalization but an approximate one will achieve the same consistency rate, we propose an approximate and constrained matrix scaling problem that can be solved by SK iterations with early termination. Numerical experiments support our theoretical results and show the robustness of bi-stochastically normalized graph Laplacian to high-dimensional outlier noise.

Bi-stochastically normalized graph Laplacian: convergence to manifold Laplacian and robustness to outlier noise

TL;DR

This work establishes that bi-stochastically normalized graph Laplacians, computed via a constrained Sinkhorn–Knopp procedure, converge pointwise to the weighted manifold Laplacian for data sampled i.i.d. from a compact -dimensional manifold, with rates matching those of traditional normalizations. It introduces an approximate, constrained matrix scaling formulation with early termination, proving 2-norm convergence under finite-sample, non-asymptotic conditions and deriving explicit scaling of the kernel bandwidth with sample size to optimize the rate. The paper further extends the theory to data corrupted by outlier noise, proving that the bi-stochastic Laplacian remains robust under a bounded-noise regime and providing a practical SK-based algorithm for noisy data. Numerical experiments on clean and noisy manifolds corroborate the theory, demonstrating accuracy comparable to diffusion-map Laplacians on clean data and superior robustness to high-dimensional outliers, with practical benefits from early-stopped SK iterations.

Abstract

Bi-stochastic normalization provides an alternative normalization of graph Laplacians in graph-based data analysis and can be computed efficiently by Sinkhorn-Knopp (SK) iterations. This paper proves the convergence of bi-stochastically normalized graph Laplacian to manifold (weighted-)Laplacian with rates, when data points are i.i.d. sampled from a general -dimensional manifold embedded in a possibly high-dimensional space. Under certain joint limit of and kernel bandwidth , the point-wise convergence rate of the graph Laplacian operator (under 2-norm) is proved to be at finite large up to log factors, achieved at the scaling of . When the manifold data are corrupted by outlier noise, we theoretically prove the graph Laplacian point-wise consistency which matches the rate for clean manifold data plus an additional term proportional to the boundedness of the inner-products of the noise vectors among themselves and with data vectors. Motivated by our analysis, which suggests that not exact bi-stochastic normalization but an approximate one will achieve the same consistency rate, we propose an approximate and constrained matrix scaling problem that can be solved by SK iterations with early termination. Numerical experiments support our theoretical results and show the robustness of bi-stochastically normalized graph Laplacian to high-dimensional outlier noise.
Paper Structure (60 sections, 18 theorems, 207 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 60 sections, 18 theorems, 207 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Lemma 3.2

Under Assumptions (A1)(A2), there exists a function $r \in C^4({\cal M})$, $r$ is determined by $p$ and manifold extrinsic coordinates, such that is $C^4$ on ${\cal M}$ and satisfies the following: (i) There are positive constants $q_{\rm min}$, $q_{\rm max}$, and $\epsilon_{0}$ which are determined by $r$ and $p$, s.t. for $\epsilon < \epsilon_{0}$, $0 < q_{\rm min} \le q_\epsilon (x) \le q_{\rm

Figures (7)

  • Figure 1: The results of one simulation of clean manifold data. The kernel bandwidth parameter is $\epsilon=$5.0119e-4. The averaged error over multiple simulations is shown in Figure \ref{['fig:Ln-error-1d']}. Top panel: (Left) Data samples lying on a one-dimensional closed curve embedded in $\mathbb{R}^4$, where the first three coordinates are shown and colored by the values of kernel affinity $W_{i_0 j}$ on $x_j$ for a fixed $i_0$. (Middle) Computed values of $\hat{L} \rho_X f$ compared with the true values of $\Delta_p f$. (Right) Comparison of $\hat{\eta}_i$ and $p^{-1/2}(x_i)$. Bottom panel: Results of bi-stochastic normalization computed with different $\varepsilon_{\rm SK}$. (Left) The convergence of SK iterations, showing the value of $\log_{10} \| D_{\hat{\eta}} W^0 D_{\hat{\eta}} {\bf 1} - {\bf 1} \|_\infty$ v.s. number of iterations; (Middle-right) Computed values of $\hat{L}^{(\rm SK)} \rho_X f$ and of $\hat{\eta}_i$. All plots where $x$-axis is $[0,1]$ are plotted v.s. the intrinsic coordinate of $x_i$ (the arclength).
  • Figure 2: Average errors on manifold clean data. Relative errors $\text{RelErr}_2$ as defined in \ref{['eq:def-L1-err']} of $\hat{L} \rho_X f$ computed by (i) $\alpha$-normalized diffusion map graph Laplacian $\hat{L}^{(\rm DM)}$ and (ii) bi-stochastic normalized graph Laplacian $\hat{L}^{(\rm SK)}$ respectively. plotted v.s. a range of values of $\epsilon$, averaged over 500 replicas of simulation. The two blue dashed lines fit the data on the two ends of small and large values of $\epsilon$, where variance and bias error dominates respectively. (Right) Same plot for $\text{RelErr}_\infty$.
  • Figure 3: Results of $\hat{L} \rho_X f$ computed from manifold data corrupted by i.i.d outlier noise in $\mathbb{R}^{m}$, $m=2000$. The kernel bandwidth parameter $\epsilon=$5e-4. Top panel: (Left) The first 3 coordinates of the data vectors, colored by the kernel affinity values $W_{i_0 j}$ on $x_j$ (point $i_0$ is an in-lier). (Middle) Computed values of $\hat{L} \rho_X f$ compared with the true values of $\Delta_p f$. (Right) Values of $\hat{\eta}_i$ on in-liers (those on out-liers are large and outside the plot axis) and those of $\hat{\eta}^c_i$ on the out-liers, compared with $p^{-1/2}(x_i)$. Bottom panel: Same plots as in the bottom panel of Figure \ref{['fig:Ln-1d-one-run']} on data with outlier noise, except that the (Middle) plot shows the values of $\hat{\eta}_i^c$.
  • Figure 4: Same plots as in Figure \ref{['fig:Ln-error-1d']} for average errors on manifold data with heteroskedastic outlier noise.
  • Figure 5: First few (non-constant) eigenvectors of $\hat{L}_{\rm rw}$ computed from $n=1000$ manifold data with heteroskedastic outlier noise added in $\mathbb{R}^m$, $m=2000$. The Gaussian kernel affinity \ref{['eq:def-G-affinity']} is computed with $\epsilon=$5e-4. Top panel: (Left) First 3 coordinates of data vectors colored by Gaussian kernel affinity values $G_{i_0 j}$ on $x_j$ (point $i_0$ is an in-lier). (Middle) Spectral embedding by the first two non-constant eigenvectors of $\hat{L}_{\rm rw}^{(\rm DM)}$, colored by the intrinsic coordinate of $x_i$. (Right) First four non-constant eigenvectors plotted against the intrinsic coordinate of $x_i$. Bottom panel: (Left) The convergence of Sinkhorn iterations; (Middle-right) Same plots as in the top panel for eigenvectors of $\hat{L}_{\rm rw}^{(\rm SK)}$. In this simulation, the MSE errors (defined in Section \ref{['subsec:spec-embed']}) for DM is 0.1234 and 0.1659 for the first and second pair of harmonics, and those for SK is 0.0030 and 0.0082.
  • ...and 2 more figures

Theorems & Definitions (44)

  • Remark 1: connectivity regime
  • Definition 3.1: $\varepsilon_{\rm SK}$-approximate scaling factor
  • Lemma 3.2: Existence of population scaling factor
  • Remark 2
  • Lemma 3.3: Comparison of approximate scaling factors
  • Remark 3
  • Lemma 3.4: Uniform upper-boundedness of empirical scaling factor
  • Proposition 4.1: Pointwise convergence of $\bar{L}_n$
  • Remark 4: Requirement on the largeness of $\epsilon$
  • Theorem 4.2
  • ...and 34 more