Table of Contents
Fetching ...

Analysis and Approximate Inference of Large Random Kronecker Graphs

Zhenyu Liao, Yuanqian Xia, Chengmei Niu, Yong Xiao

TL;DR

The paper addresses scalable parameter inference for large random Kronecker graphs by proving a spectral signal-plus-noise decomposition: the adjacency ${\mathbf A}$ is close to a low-rank signal plus zero-mean noise after appropriate permutation, with a signal ${\mathbf S}_K$ of rank at most $(m-1)K+1$ that is linear in the initiator perturbations. Building on this, the authors propose a denoise-and-solve meta algorithm that first denoises the adjacency via a shrinkage-based estimator to recover the low-rank signal, then solves a permuted linear regression to recover the initiator parameters ${\mathbf P}_1$ (or ${\mathbf X}$), achieving near-linear time complexity in the number of nodes $N$ and offering RNLA-assisted speedups. The approach is validated through synthetic experiments against KronFit and through realistic graph classification benchmarks, showing competitive or superior performance in many regimes and substantial scalability improvements. The work advances scalable graph inference and representation learning by combining high-dimensional spectral analysis with practical, scalable optimization for Kronecker graph models.

Abstract

Random graph models are playing an increasingly important role in various fields ranging from social networks, telecommunication systems, to physiologic and biological networks. Within this landscape, the random Kronecker graph model, emerges as a prominent framework for scrutinizing intricate real-world networks. In this paper, we investigate large random Kronecker graphs, i.e., the number of graph vertices $N$ is large. Built upon recent advances in random matrix theory (RMT) and high-dimensional statistics, we prove that the adjacency of a large random Kronecker graph can be decomposed, in a spectral norm sense, into two parts: a small-rank (of rank $O(\log N)$) signal matrix that is linear in the graph parameters and a zero-mean random noise matrix. Based on this result, we propose a ``denoise-and-solve'' approach to infer the key graph parameters, with significantly reduced computational complexity. Experiments on both graph inference and classification are presented to evaluate the our proposed method. In both tasks, the proposed approach yields comparable or advantageous performance, than widely-used graph inference (e.g., KronFit) and graph neural net baselines, at a time cost that scales linearly as the graph size $N$.

Analysis and Approximate Inference of Large Random Kronecker Graphs

TL;DR

The paper addresses scalable parameter inference for large random Kronecker graphs by proving a spectral signal-plus-noise decomposition: the adjacency is close to a low-rank signal plus zero-mean noise after appropriate permutation, with a signal of rank at most that is linear in the initiator perturbations. Building on this, the authors propose a denoise-and-solve meta algorithm that first denoises the adjacency via a shrinkage-based estimator to recover the low-rank signal, then solves a permuted linear regression to recover the initiator parameters (or ), achieving near-linear time complexity in the number of nodes and offering RNLA-assisted speedups. The approach is validated through synthetic experiments against KronFit and through realistic graph classification benchmarks, showing competitive or superior performance in many regimes and substantial scalability improvements. The work advances scalable graph inference and representation learning by combining high-dimensional spectral analysis with practical, scalable optimization for Kronecker graph models.

Abstract

Random graph models are playing an increasingly important role in various fields ranging from social networks, telecommunication systems, to physiologic and biological networks. Within this landscape, the random Kronecker graph model, emerges as a prominent framework for scrutinizing intricate real-world networks. In this paper, we investigate large random Kronecker graphs, i.e., the number of graph vertices is large. Built upon recent advances in random matrix theory (RMT) and high-dimensional statistics, we prove that the adjacency of a large random Kronecker graph can be decomposed, in a spectral norm sense, into two parts: a small-rank (of rank ) signal matrix that is linear in the graph parameters and a zero-mean random noise matrix. Based on this result, we propose a ``denoise-and-solve'' approach to infer the key graph parameters, with significantly reduced computational complexity. Experiments on both graph inference and classification are presented to evaluate the our proposed method. In both tasks, the proposed approach yields comparable or advantageous performance, than widely-used graph inference (e.g., KronFit) and graph neural net baselines, at a time cost that scales linearly as the graph size .
Paper Structure (28 sections, 13 theorems, 81 equations, 5 figures, 2 tables, 3 algorithms)

This paper contains 28 sections, 13 theorems, 81 equations, 5 figures, 2 tables, 3 algorithms.

Key Result

Proposition 1

Under ass:growth-rate and for $N$ large, we have, for $\mathbf{P}_K \in {\mathbb{R}}^{N \times N}$ the $K$-th Kronecker power of $\mathbf{P}_1$ as in eq:def_bP_K that:

Figures (5)

  • Figure 1: Estimation MSEs (left) and running time (right) of KronFit versus \ref{['algo:meta']} (with IHT and convex relaxation), on random Kronecker graphs in \ref{['def:random_Kronecker_graph']}, with $p\in [0.3, 0.8]$, $\mathbf{x} = [5.25, 0.25, 2.25, -7.75]$, and $20\%$ vertices randomly shuffled, for $N = 1\,024, 2\,048$, $s = 5$ in \ref{['alg:permuted_LR']} for IHT. Result obtained over $10$ independent runs on the same graph.
  • Figure 2: Estimation MSEs of \ref{['algo:meta']} on Kronecker graphs as in \ref{['def:random_Kronecker_graph']} with $p = 0.7$, $N = 1\,024, 2\,048$, and same $\mathbf{x}$ as in \ref{['fig:compare_Kronfit_IHT']}, as a function of the percentage of node permutation.
  • Figure 3: Estimation MSEs (left) and running time (right) of \ref{['algo:meta']}, with and without RNLA acceleration in \ref{['rem:complexity']}, on random Kronecker graphs as in \ref{['fig:compare_Kronfit_IHT']} for $N = 2\,048$. For randomized SVD halko2011Finding, we use an iteration count of $q=2$; for random sampling, we choose $100$ from $N$ blocks uniformly at random. Result obtained over $10$ independent runs.
  • Figure 4: (Left) Histogram of singular values of $\bar{\mathbf{A}}/\sqrt{\bar{p} (1- \bar{p})}$ (blue) versus the limiting quarter-circle law spectrum and spikes (red). (Right) Left singular vector associated to the largest singular value of $\bar{\mathbf{A}}$ (blue), versus the (rescaled, according to \ref{['theo:bar_A']}) top left singular vector of $\mathbf{S}_K^{\boldsymbol{\Pi} = \mathbf{I}_N}$ (red). A similar observation can be made for right singular vectors, but with larger random fluctuation. With $m = 2$, $K= 12$ so that $N = m^K = 4\,096$, $p = 0.7$ and ${\rm vec}(\mathbf{X}) = [-5.5,5.5,-1.5,1.5]^{\sf T}$.
  • Figure 5: Estimation MSEs (left) and running time (right) of the moment-based method, the KronFit algorithm and the proposed approach on random undirected Kronecker graphs with $p$ ranging from $0.3$ to $0.8$, $\mathbf{x} = [4.75, 1.75, 1.75, -8.25]$, and $20\%$ vertices randomly shuffled, for $N = 1\,024$ and $2\,048$. Result obtained over $10$ independent runs.

Theorems & Definitions (26)

  • Definition 1: Random Kronecker graph
  • Remark 1: Vertices matching
  • Remark 2: On \ref{['ass:growth-rate']}
  • Proposition 1: Approximate small-rankness of $\mathbf{P}_K$
  • Theorem 1: Signal-plus-noise decomposition for $\mathbf{A}$
  • Lemma 1: Consistent estimation of $p$
  • Proposition 2: Signal-plus-noise decomposition for $\bar{\mathbf{A}}$
  • Remark 3: Time complexity of Algorithms \ref{['alg:shrinkage_estim']} and \ref{['alg:permuted_LR']}
  • Remark 4: Numerical stability
  • Lemma 2: Weyl's inequality, horn2012matrix
  • ...and 16 more