Table of Contents
Fetching ...

Random line graphs and edge-attributed network inference

Zachary Lubberts, Avanti Athreya, Youngser Park, Carey E. Priebe

TL;DR

The paper develops a probabilistic spectral framework for random line graphs to enable edge-directed inference. By embedding the line-graph adjacency into the line graph of the complete graph and using a diagonal projection, it defines a fixed mean structure and proves eigenvalue concentration and limiting results, including a CLT for ER line graphs and a canonical ESD limit given by a free convolution. It shows that for SBM line graphs there exists a signal-preserving subspace tied to edge clusters, and that a carefully constructed projection yields consistent estimates of edge latent positions, even in the presence of random graph size and no spectral gap. Simulations demonstrate that integrating edge covariates via scMASE with the projection improves edge clustering beyond vertex-only or covariate-only approaches, highlighting the practical impact of combining vertex and edge information for network inference. The work provides a principled, scalable framework for edge-centric network inference with potential extensions to broader random-graph models and covariate settings.

Abstract

We extend the latent position random graph model to the line graph of a random graph, which is formed by creating a vertex for each edge in the original random graph, and connecting each pair of edges incident to a common vertex in the original graph. We prove concentration inequalities for the spectrum of a line graph, as well as limiting distribution results for the largest eigenvalue and the empirical spectral distribution in certain settings. For the stochastic blockmodel, we establish that although naive spectral decompositions can fail to extract necessary signal for edge clustering, there exist signal-preserving singular subspaces of the line graph that can be recovered through a carefully-chosen projection. Moreover, we can consistently estimate edge latent positions in a random line graph, even though such graphs are of a random size, typically have high rank, and possess no spectral gap. Our results demonstrate that the line graph of a stochastic block model exhibits underlying block structure, and in simulations, we synthesize and test our methods against several commonly-used techniques, including tensor decompositions, for cluster recovery and edge covariate inference. By naturally incorporating information encoded in both vertices and edges, the random line graph improves network inference.

Random line graphs and edge-attributed network inference

TL;DR

The paper develops a probabilistic spectral framework for random line graphs to enable edge-directed inference. By embedding the line-graph adjacency into the line graph of the complete graph and using a diagonal projection, it defines a fixed mean structure and proves eigenvalue concentration and limiting results, including a CLT for ER line graphs and a canonical ESD limit given by a free convolution. It shows that for SBM line graphs there exists a signal-preserving subspace tied to edge clusters, and that a carefully constructed projection yields consistent estimates of edge latent positions, even in the presence of random graph size and no spectral gap. Simulations demonstrate that integrating edge covariates via scMASE with the projection improves edge clustering beyond vertex-only or covariate-only approaches, highlighting the practical impact of combining vertex and edge information for network inference. The work provides a principled, scalable framework for edge-centric network inference with potential extensions to broader random-graph models and covariate settings.

Abstract

We extend the latent position random graph model to the line graph of a random graph, which is formed by creating a vertex for each edge in the original random graph, and connecting each pair of edges incident to a common vertex in the original graph. We prove concentration inequalities for the spectrum of a line graph, as well as limiting distribution results for the largest eigenvalue and the empirical spectral distribution in certain settings. For the stochastic blockmodel, we establish that although naive spectral decompositions can fail to extract necessary signal for edge clustering, there exist signal-preserving singular subspaces of the line graph that can be recovered through a carefully-chosen projection. Moreover, we can consistently estimate edge latent positions in a random line graph, even though such graphs are of a random size, typically have high rank, and possess no spectral gap. Our results demonstrate that the line graph of a stochastic block model exhibits underlying block structure, and in simulations, we synthesize and test our methods against several commonly-used techniques, including tensor decompositions, for cluster recovery and edge covariate inference. By naturally incorporating information encoded in both vertices and edges, the random line graph improves network inference.

Paper Structure

This paper contains 13 sections, 17 theorems, 170 equations, 5 figures.

Key Result

Proposition 1

Let $G=([n],E)$ be a graph, $L(G)=(E,\mathcal{E})$ its line graph, and $\widehat{m}=|E|$. Then where $D$ is the diagonal matrix with $D_{i,i}=\mathrm{deg}_{G}(i), 1\leq i\leq n$. In the latter case,

Figures (5)

  • Figure 1: A stochastic blockmodel graph and its line graph.
  • Figure 2: A three-block SBM adjacency matrix with its best block-constant estimate. The line graph of this matrix with its best block-constant estimate, corresponding to $T_1$.
  • Figure 3: Depiction of the embedding obtained by using the left singular vectors of $A(L(G))\hat{Q}$ corresponding to the largest three singular values. GMM clustering using the first two singular vectors results in an ARI of 0.5 when compared to the true clusters, but GMM clustering using the second two singular vectors results in an ARI of 1, indicating perfect recovery of the clusters. Our method reveals the underlying edge communities.
  • Figure 4: Depiction of the embedding obtained by using the left singular vectors of $A(L(G))$ corresponding to the largest three singular values. GMM clustering using any two of these singular vectors results in an ARI of 0 when compared to the true clusters, indicating a clustering with no relation to the true memberships. This naive method fails to extract important signal, namely the edge cluster memberships.
  • Figure 5: Comparison of edge clusterings based on (i) Adjacency matrix of the original graph only, using induced clustering (blue); (ii) Edge covariates only (yellow) (iii) Adjacency matrix of the line graph, using a projection matrix, with no covariates (purple); (iv) Adjacency matrix of the line graph with covariates, but no projection matrix (green); (v) Adjacency matrix of the line graph with covariates and projection matrix (orange); (vi) Tensor decomposition mode 3 embeddings; (vii) Tensor decomposition modes 1 and 2 embeddings. All clusterings are compared to ground truth, measured using adjusted Rand index. We see that combining the information from both sources using scMASE with the projection yields results comparable or better than the maximum of those approaches which only make use of one source of data. All experiments consider a 3-block SBM with $n=300$ and 50 trials. See the text for a full description of the procedure used to generate these plots.

Theorems & Definitions (42)

  • Definition 1.1
  • Definition 1.2
  • Definition 1.3
  • Proposition 1
  • Corollary 1
  • Theorem 2.1
  • Remark 1
  • Theorem 2.2
  • Theorem 2.3
  • Definition 2.4
  • ...and 32 more