Estimating Graph Dimension with Cross-validated Eigenvalues

Fan Chen; Sebastien Roch; Karl Rohe; Shuqi Yu

Estimating Graph Dimension with Cross-validated Eigenvalues

Fan Chen, Sebastien Roch, Karl Rohe, Shuqi Yu

TL;DR

This work introduces cross-validated eigenvalues to estimate the latent dimension $k$ in random graph models without strict parametric assumptions. The method relies on edge splitting to construct independent split graphs, preserving population eigenvectors, and a central limit theorem to produce p-values for each eigenvector, enabling consistent estimation of $k$ when all signal dimensions are detectable. It provides a flexible, scalable alternative to existing techniques, with theoretical guarantees under Poisson and Bernoulli graph models and strong empirical performance on simulated and real networks. The approach yields interpretable results and competitive accuracy with substantially reduced computational cost, broadening practical applicability in network science and high-dimensional spectral inference.

Abstract

In applied multivariate statistics, estimating the number of latent dimensions or the number of clusters, $k$, is a fundamental and recurring problem. We study a sequence of statistics called "cross-validated eigenvalues." Under a large class of random graph models, including both Poisson and Bernoulli edges, without parametric assumptions, we provide a $p$-value for each cross-validated eigenvalue. It tests the null hypothesis that the sample eigenvector is orthogonal to (i.e., uncorrelated with) the true latent dimensions. This approach naturally adapts to problems where some dimensions are not statistically detectable. In scenarios where all $k$ dimensions can be estimated, we show that our procedure consistently estimates $k$. In simulations and data example, the proposed estimator compares favorably to alternative approaches in both computational and statistical performance.

Estimating Graph Dimension with Cross-validated Eigenvalues

TL;DR

This work introduces cross-validated eigenvalues to estimate the latent dimension

in random graph models without strict parametric assumptions. The method relies on edge splitting to construct independent split graphs, preserving population eigenvectors, and a central limit theorem to produce p-values for each eigenvector, enabling consistent estimation of

when all signal dimensions are detectable. It provides a flexible, scalable alternative to existing techniques, with theoretical guarantees under Poisson and Bernoulli graph models and strong empirical performance on simulated and real networks. The approach yields interpretable results and competitive accuracy with substantially reduced computational cost, broadening practical applicability in network science and high-dimensional spectral inference.

Abstract

In applied multivariate statistics, estimating the number of latent dimensions or the number of clusters,

, is a fundamental and recurring problem. We study a sequence of statistics called "cross-validated eigenvalues." Under a large class of random graph models, including both Poisson and Bernoulli edges, without parametric assumptions, we provide a

-value for each cross-validated eigenvalue. It tests the null hypothesis that the sample eigenvector is orthogonal to (i.e., uncorrelated with) the true latent dimensions. This approach naturally adapts to problems where some dimensions are not statistically detectable. In scenarios where all

dimensions can be estimated, we show that our procedure consistently estimates

. In simulations and data example, the proposed estimator compares favorably to alternative approaches in both computational and statistical performance.

Estimating Graph Dimension with Cross-validated Eigenvalues

TL;DR

Abstract

Estimating Graph Dimension with Cross-validated Eigenvalues

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (44)