When does the mean network capture the topology of a sample of networks?
François G Meyer
TL;DR
This paper analyzes when the Fréchet mean (network barycenter) faithfully captures the topology of a sample of networks drawn from a two-community stochastic block model. It compares two distances, the Hamming distance $d_H$ and the resistance perturbation distance $d_{rp}$, and derives analytical expressions for the sample Fréchet mean under each. The key findings are that $d_H$-based means collapse to the majority-rule median (and are empty in sparse regimes), while $d_{rp}$-based means converge to the population mean adjacency $\bm{P}$, effectively recovering the underlying community structure. The results provide practical guidance on metric choice for network-valued ML tasks and establish theoretical guarantees for mean estimation in network ensembles, with explicit expressions for mean resistances and edge densities.
Abstract
The notion of Fréchet mean (also known as "barycenter") network is the workhorse of most machine learning algorithms that require the estimation of a "location" parameter to analyse network-valued data. In this context, it is critical that the network barycenter inherits the topological structure of the networks in the training dataset. The metric - which measures the proximity between networks - controls the structural properties of the barycenter. This work is significant because it provides for the first time analytical estimates of the sample Fréchet mean for the stochastic blockmodel, which is at the cutting edge of rigorous probabilistic analysis of random networks. We show that the mean network computed with the Hamming distance is unable to capture the topology of the networks in the training sample, whereas the mean network computed using the effective resistance distance recovers the correct partitions and associated edge density. From a practical standpoint, our work informs the choice of metrics in the context where the sample Fréchet mean network is used to characterise the topology of networks for network-valued machine learning
