Correlation and Autocorrelation of Data on Complex Networks
Rudy Arthur
TL;DR
The paper adapts Moran-type spatial statistics to networks with real-valued node data, introducing global and local autocorrelation, bivariate network correlation, and correlograms for general graphs. It formalises Moran's $I$ with a weight matrix $W$, introduces Node Indicators of Network Association (NINA) for local structure, and compares data-permutation and configuration-model nulls to assess significance. For cross-variable analysis, it exposes limitations of Pearson correlation under autocorrelation and proposes Lee's $L$ as a robust alternative, illustrating it on synthetic networks. The methodology is demonstrated on real (Wikipedia, EgoMinusEgo network) and synthetic networks, showing meaningful autocorrelation patterns tied to modular structure and highlighting practical tools and implementations for network-based exploratory data analysis.
Abstract
Networks where each node has one or more associated numerical values are common in applications. This work studies how summary statistics used for the analysis of spatial data can be applied to non-spatial networks for the purposes of exploratory data analysis. We focus primarily on Moran-type statistics and discuss measures of global autocorrelation, local autocorrelation and global correlation. We introduce null models based on fixing edges and permuting the data or fixing the data and permuting the edges. We demonstrate the use of these statistics on real and synthetic node-valued networks.
