Table of Contents
Fetching ...

Correlation and Autocorrelation of Data on Complex Networks

Rudy Arthur

TL;DR

The paper adapts Moran-type spatial statistics to networks with real-valued node data, introducing global and local autocorrelation, bivariate network correlation, and correlograms for general graphs. It formalises Moran's $I$ with a weight matrix $W$, introduces Node Indicators of Network Association (NINA) for local structure, and compares data-permutation and configuration-model nulls to assess significance. For cross-variable analysis, it exposes limitations of Pearson correlation under autocorrelation and proposes Lee's $L$ as a robust alternative, illustrating it on synthetic networks. The methodology is demonstrated on real (Wikipedia, EgoMinusEgo network) and synthetic networks, showing meaningful autocorrelation patterns tied to modular structure and highlighting practical tools and implementations for network-based exploratory data analysis.

Abstract

Networks where each node has one or more associated numerical values are common in applications. This work studies how summary statistics used for the analysis of spatial data can be applied to non-spatial networks for the purposes of exploratory data analysis. We focus primarily on Moran-type statistics and discuss measures of global autocorrelation, local autocorrelation and global correlation. We introduce null models based on fixing edges and permuting the data or fixing the data and permuting the edges. We demonstrate the use of these statistics on real and synthetic node-valued networks.

Correlation and Autocorrelation of Data on Complex Networks

TL;DR

The paper adapts Moran-type spatial statistics to networks with real-valued node data, introducing global and local autocorrelation, bivariate network correlation, and correlograms for general graphs. It formalises Moran's with a weight matrix , introduces Node Indicators of Network Association (NINA) for local structure, and compares data-permutation and configuration-model nulls to assess significance. For cross-variable analysis, it exposes limitations of Pearson correlation under autocorrelation and proposes Lee's as a robust alternative, illustrating it on synthetic networks. The methodology is demonstrated on real (Wikipedia, EgoMinusEgo network) and synthetic networks, showing meaningful autocorrelation patterns tied to modular structure and highlighting practical tools and implementations for network-based exploratory data analysis.

Abstract

Networks where each node has one or more associated numerical values are common in applications. This work studies how summary statistics used for the analysis of spatial data can be applied to non-spatial networks for the purposes of exploratory data analysis. We focus primarily on Moran-type statistics and discuss measures of global autocorrelation, local autocorrelation and global correlation. We introduce null models based on fixing edges and permuting the data or fixing the data and permuting the edges. We demonstrate the use of these statistics on real and synthetic node-valued networks.
Paper Structure (15 sections, 13 equations, 11 figures, 2 tables)

This paper contains 15 sections, 13 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Left: LFR benchmark network, Middle: Karate club, Right: Erdős-Renyi graph. Top: Force directed layout, node colours indicate node values initialised as described in the text with $M=10$ for LFR and Karate club and $M=30$ for ER and $\sigma=0.1$ for all three. Bottom: Distribution of values from random permutations of the data, and configuration model.
  • Figure 2: Center shows the Moran scatter plot. The nodes in each quadrant are labelled in the corresponding network diagram.
  • Figure 3: Local network statistics. Left: Data-permutation null. Right: Configuration model null. Top: Network with 'interesting' nodes under the corresponding null models labelled. Bottom: Histograms of node Moran index, bins containing significant values are coloured orange.
  • Figure 4: Different propagation processes on the same LFR network. The network Moran index is shown for each data set. All $I$ values are highly statistically significant.
  • Figure 5: Left: LFR network. Right: Moran Correlogram $I(d)$. Solid blue points have p-value (under the data permutation null) $<0.01$, open points are above this threshold.
  • ...and 6 more figures