Uncertainty quantification and posterior sampling for network reconstruction
Tiago P. Peixoto
TL;DR
This work tackles the ill-posed problem of reconstructing networks from indirect data by framing it as Bayesian inference and delivering an efficient MCMC approach to sample the full posterior over networks. It introduces a data-driven, sub-quadratic sampling scheme that emphasizes a typical edge set and near-neighbor proposals to dramatically improve mixing for large, sparse graphs, while enabling marginal-edge probabilities and posterior weights to be quantified. Through synthetic and empirical case studies, the method demonstrates that posterior sampling yields more accurate reconstructions and rich uncertainty information, outperforming traditional correlation-based heuristics. The framework is model-agnostic and integrates MDL regularization with SBM priors, offering a scalable path toward model selection and predictive inference for unseen interactions in complex networks.
Abstract
Network reconstruction is the task of inferring the unseen interactions between elements of a system, based only on their behavior or dynamics. This inverse problem is in general ill-posed, and admits many solutions for the same observation. Nevertheless, the vast majority of statistical methods proposed for this task -- formulated as the inference of a graphical generative model -- can only produce a ``point estimate,'' i.e. a single network considered the most likely. In general, this can give only a limited characterization of the reconstruction, since uncertainties and competing answers cannot be conveyed, even if their probabilities are comparable, while being structurally different. In this work we present an efficient MCMC algorithm for sampling from posterior distributions of reconstructed networks, which is able to reveal the full population of answers for a given reconstruction problem, weighted according to their plausibilities. Our algorithm is general, since it does not rely on specific properties of particular generative models, and is specially suited for the inference of large and sparse networks, since in this case an iteration can be performed in time $O(N\log^2 N)$ for a network of $N$ nodes, instead of $O(N^2)$, as would be the case for a more naive approach. We demonstrate the suitability of our method in providing uncertainties and consensus of solutions (which provably increases the reconstruction accuracy) in a variety of synthetic and empirical cases.
