Table of Contents
Fetching ...

Asymptotic Theory of Eigenvectors for Latent Embeddings with Generalized Laplacian Matrices

Jianqing Fan, Yingying Fan, Jinchi Lv, Fan Yang, Diwen Yu

TL;DR

The paper develops an asymptotic theory for eigenvectors and eigenvalues of generalized (regularized) Laplacian matrices ${\bf X}= {\bf L}^{-\alpha}\widetilde{\bf X}{\bf L}^{-\alpha}$ in high dimensions where entries exhibit dependency. Building on a generalized quadratic vector equation (QVE) and refined local laws for resolvents, the authors derive LLNs and CLTs for spiked eigenpairs, providing almost sharp expansions around population limits $t_k$ and revealing a phase-transition in eigenvector projections. A key innovation is a decorrelation technique via an intermediate matrix ${\bf L}_{[i]}$ that mitigates dependence between the diagonal and random parts, enabling precise asymptotics even under sparsity and non-Bernoulli noise. The results enable uncertainty quantification and inference for latent embeddings in graphs and manifolds, with concrete applications to graph neural networks, confidence intervals for node memberships and network parameters, and uncertainty quantification in community detection. The theory is validated through simulations showing accurate finite-sample performance across a range of $\alpha$ and network sparsity. Overall, ATE-GL furnishes a flexible, principled framework for spectral inference in dependent-Laplacian settings, with broad implications for network analysis and latent-space modeling.

Abstract

Laplacian matrices are commonly employed in many real applications, encoding the underlying latent structural information such as graphs and manifolds. The use of the normalization terms naturally gives rise to random matrices with dependency. It is well-known that dependency is a major bottleneck of new random matrix theory (RMT) developments. To this end, in this paper, we formally introduce a class of generalized (and regularized) Laplacian matrices, which contains the Laplacian matrix and the random adjacency matrix as a specific case, and suggest the new framework of the asymptotic theory of eigenvectors for latent embeddings with generalized Laplacian matrices (ATE-GL). Our new theory is empowered by the tool of generalized quadratic vector equation for dealing with RMT under dependency, and delicate high-order asymptotic expansions of the empirical spiked eigenvectors and eigenvalues based on local laws. The asymptotic normalities established for both spiked eigenvectors and eigenvalues will enable us to conduct precise inference and uncertainty quantification for applications involving the generalized Laplacian matrices with flexibility. We discuss some applications of the suggested ATE-GL framework and showcase its validity through some numerical examples.

Asymptotic Theory of Eigenvectors for Latent Embeddings with Generalized Laplacian Matrices

TL;DR

The paper develops an asymptotic theory for eigenvectors and eigenvalues of generalized (regularized) Laplacian matrices in high dimensions where entries exhibit dependency. Building on a generalized quadratic vector equation (QVE) and refined local laws for resolvents, the authors derive LLNs and CLTs for spiked eigenpairs, providing almost sharp expansions around population limits and revealing a phase-transition in eigenvector projections. A key innovation is a decorrelation technique via an intermediate matrix that mitigates dependence between the diagonal and random parts, enabling precise asymptotics even under sparsity and non-Bernoulli noise. The results enable uncertainty quantification and inference for latent embeddings in graphs and manifolds, with concrete applications to graph neural networks, confidence intervals for node memberships and network parameters, and uncertainty quantification in community detection. The theory is validated through simulations showing accurate finite-sample performance across a range of and network sparsity. Overall, ATE-GL furnishes a flexible, principled framework for spectral inference in dependent-Laplacian settings, with broad implications for network analysis and latent-space modeling.

Abstract

Laplacian matrices are commonly employed in many real applications, encoding the underlying latent structural information such as graphs and manifolds. The use of the normalization terms naturally gives rise to random matrices with dependency. It is well-known that dependency is a major bottleneck of new random matrix theory (RMT) developments. To this end, in this paper, we formally introduce a class of generalized (and regularized) Laplacian matrices, which contains the Laplacian matrix and the random adjacency matrix as a specific case, and suggest the new framework of the asymptotic theory of eigenvectors for latent embeddings with generalized Laplacian matrices (ATE-GL). Our new theory is empowered by the tool of generalized quadratic vector equation for dealing with RMT under dependency, and delicate high-order asymptotic expansions of the empirical spiked eigenvectors and eigenvalues based on local laws. The asymptotic normalities established for both spiked eigenvectors and eigenvalues will enable us to conduct precise inference and uncertainty quantification for applications involving the generalized Laplacian matrices with flexibility. We discuss some applications of the suggested ATE-GL framework and showcase its validity through some numerical examples.

Paper Structure

This paper contains 47 sections, 43 theorems, 353 equations, 12 figures, 6 tables.

Key Result

Lemma 1

Under parts (ii) and (iii) of Assumption main_assm, for each $1 \leq k \leq K_0$, there exists a unique solution $x=t_k$ to equation eq:sft_k def in the subset $\widetilde{\mathcal{I}}_k$, and it holds that

Figures (12)

  • Figure 1: The kernel density estimate (KDE) for the distribution of the empirical spiked eigenvalue $\widehat{\delta}_k$ corrected by $A_k$ for the generalized Laplacian matrix $\hbox{\bf X}$ with $k = 1$ across different values of $\alpha$ based on $500$ replications for simulation example in Section \ref{['new.Sec.simu']} with $\theta = 0.9$. The generalized (regularized) Laplacian matrix $\hbox{\bf X}$ is as given in (\ref{['new.eq.FL.gLap']}) with $\hbox{\bf L}=\hbox{\bf L}_{\tau,\lambda} := \mathrm{diag}\left(d_i+\tau\bar{d}+\lambda: i \in [n]\right)$ without the rescaling population parameters $q$ and $\beta_n$. The blue curves represent the KDEs for the empirical spiked eigenvalue corrected by $A_k$, whereas the red curves stand for the target normal density. Both curves are centered with the asymptotic limit $t_k$. The top right plot is due to extremely small empirical standard deviations (as shown in Table \ref{['tab1']} with empirical SD = 7.75E-11 and asymptotic SD = 4.89E-07). This is associated with the fact that the normalized Laplacian matrix has a trivial largest eigenvalue at 1.
  • Figure 2: The kernel density estimate (KDE) for the distribution of the empirical spiked eigenvalue $\widehat{\delta}_k$ corrected by $A_k$ for the generalized Laplacian matrix $\hbox{\bf X}$ with $k = 2$ across different values of $\alpha$ based on $500$ replications for simulation example in Section \ref{['new.Sec.simu']} with $\theta = 0.9$. The generalized (regularized) Laplacian matrix $\hbox{\bf X}$ is as given in (\ref{['new.eq.FL.gLap']}) with $\hbox{\bf L}=\hbox{\bf L}_{\tau,\lambda} := \mathrm{diag}\left(d_i+\tau\bar{d}+\lambda: i \in [n]\right)$ without the rescaling population parameters $q$ and $\beta_n$. The blue curves represent the KDEs for the empirical spiked eigenvalue corrected by $A_k$, whereas the red curves stand for the target normal density. Both curves are centered with the asymptotic limit $t_k$.
  • Figure 3: The kernel density estimate (KDE) for the distribution of the empirical spiked eigenvalue $\widehat{\delta}_k$ corrected by $A_k$ for the generalized Laplacian matrix $\hbox{\bf X}$ with $k = 3$ across different values of $\alpha$ based on $500$ replications for simulation example in Section \ref{['new.Sec.simu']} with $\theta = 0.9$. The generalized (regularized) Laplacian matrix $\hbox{\bf X}$ is as given in (\ref{['new.eq.FL.gLap']}) with $\hbox{\bf L}=\hbox{\bf L}_{\tau,\lambda} := \mathrm{diag}\left(d_i+\tau\bar{d}+\lambda: i \in [n]\right)$ without the rescaling population parameters $q$ and $\beta_n$. The blue curves represent the KDEs for the empirical spiked eigenvalue corrected by $A_k$, whereas the red curves stand for the target normal density. Both curves are centered with the asymptotic limit $t_k$.
  • Figure 4: The kernel density estimate (KDE) for the distribution of the empirical spiked eigenvector component $\widehat{\mathbf{v}}_k(i)$ (rescaled by $L_i^\alpha/\Lambda_i^\alpha$) for the generalized Laplacian matrix $\hbox{\bf X}$ with $k = 1$ and $i = 1$ across different values of $\alpha$ based on $500$ replications for simulation example in Section \ref{['new.Sec.simu']} with $\theta = 0.9$. The generalized (regularized) Laplacian matrix $\hbox{\bf X}$ is as given in (\ref{['new.eq.FL.gLap']}) with $\hbox{\bf L}=\hbox{\bf L}_{\tau,\lambda} := \mathrm{diag}\left(d_i+\tau\bar{d}+\lambda: i \in [n]\right)$ without the rescaling population parameters $q$ and $\beta_n$. The blue curves represent the KDEs for the rescaled empirical spiked eigenvector component, whereas the red curves stand for the target normal density. Both curves are centered with the asymptotic limit ${\mathbf v}_k(i)$.
  • Figure 5: The kernel density estimate (KDE) for the distribution of the empirical spiked eigenvector component $\widehat{\mathbf{v}}_k(i)$ (rescaled by $L_i^\alpha/\Lambda_i^\alpha$) for the generalized Laplacian matrix $\hbox{\bf X}$ with $k = 2$ and $i = 1$ across different values of $\alpha$ based on $500$ replications for simulation example in Section \ref{['new.Sec.simu']} with $\theta = 0.9$. The generalized (regularized) Laplacian matrix $\hbox{\bf X}$ is as given in (\ref{['new.eq.FL.gLap']}) with $\hbox{\bf L}=\hbox{\bf L}_{\tau,\lambda} := \mathrm{diag}\left(d_i+\tau\bar{d}+\lambda: i \in [n]\right)$ without the rescaling population parameters $q$ and $\beta_n$. The blue curves represent the KDEs for the rescaled empirical spiked eigenvector component, whereas the red curves stand for the target normal density. Both curves are centered with the asymptotic limit ${\mathbf v}_k(i)$.
  • ...and 7 more figures

Theorems & Definitions (53)

  • Definition 1
  • Remark 1
  • Example 1: DCMM model
  • Definition 2
  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Remark 2
  • Remark 3
  • ...and 43 more