Table of Contents
Fetching ...

Uncertainty quantification and posterior sampling for network reconstruction

Tiago P. Peixoto

TL;DR

This work tackles the ill-posed problem of reconstructing networks from indirect data by framing it as Bayesian inference and delivering an efficient MCMC approach to sample the full posterior over networks. It introduces a data-driven, sub-quadratic sampling scheme that emphasizes a typical edge set and near-neighbor proposals to dramatically improve mixing for large, sparse graphs, while enabling marginal-edge probabilities and posterior weights to be quantified. Through synthetic and empirical case studies, the method demonstrates that posterior sampling yields more accurate reconstructions and rich uncertainty information, outperforming traditional correlation-based heuristics. The framework is model-agnostic and integrates MDL regularization with SBM priors, offering a scalable path toward model selection and predictive inference for unseen interactions in complex networks.

Abstract

Network reconstruction is the task of inferring the unseen interactions between elements of a system, based only on their behavior or dynamics. This inverse problem is in general ill-posed, and admits many solutions for the same observation. Nevertheless, the vast majority of statistical methods proposed for this task -- formulated as the inference of a graphical generative model -- can only produce a ``point estimate,'' i.e. a single network considered the most likely. In general, this can give only a limited characterization of the reconstruction, since uncertainties and competing answers cannot be conveyed, even if their probabilities are comparable, while being structurally different. In this work we present an efficient MCMC algorithm for sampling from posterior distributions of reconstructed networks, which is able to reveal the full population of answers for a given reconstruction problem, weighted according to their plausibilities. Our algorithm is general, since it does not rely on specific properties of particular generative models, and is specially suited for the inference of large and sparse networks, since in this case an iteration can be performed in time $O(N\log^2 N)$ for a network of $N$ nodes, instead of $O(N^2)$, as would be the case for a more naive approach. We demonstrate the suitability of our method in providing uncertainties and consensus of solutions (which provably increases the reconstruction accuracy) in a variety of synthetic and empirical cases.

Uncertainty quantification and posterior sampling for network reconstruction

TL;DR

This work tackles the ill-posed problem of reconstructing networks from indirect data by framing it as Bayesian inference and delivering an efficient MCMC approach to sample the full posterior over networks. It introduces a data-driven, sub-quadratic sampling scheme that emphasizes a typical edge set and near-neighbor proposals to dramatically improve mixing for large, sparse graphs, while enabling marginal-edge probabilities and posterior weights to be quantified. Through synthetic and empirical case studies, the method demonstrates that posterior sampling yields more accurate reconstructions and rich uncertainty information, outperforming traditional correlation-based heuristics. The framework is model-agnostic and integrates MDL regularization with SBM priors, offering a scalable path toward model selection and predictive inference for unseen interactions in complex networks.

Abstract

Network reconstruction is the task of inferring the unseen interactions between elements of a system, based only on their behavior or dynamics. This inverse problem is in general ill-posed, and admits many solutions for the same observation. Nevertheless, the vast majority of statistical methods proposed for this task -- formulated as the inference of a graphical generative model -- can only produce a ``point estimate,'' i.e. a single network considered the most likely. In general, this can give only a limited characterization of the reconstruction, since uncertainties and competing answers cannot be conveyed, even if their probabilities are comparable, while being structurally different. In this work we present an efficient MCMC algorithm for sampling from posterior distributions of reconstructed networks, which is able to reveal the full population of answers for a given reconstruction problem, weighted according to their plausibilities. Our algorithm is general, since it does not rely on specific properties of particular generative models, and is specially suited for the inference of large and sparse networks, since in this case an iteration can be performed in time for a network of nodes, instead of , as would be the case for a more naive approach. We demonstrate the suitability of our method in providing uncertainties and consensus of solutions (which provably increases the reconstruction accuracy) in a variety of synthetic and empirical cases.

Paper Structure

This paper contains 16 sections, 55 equations, 12 figures.

Figures (12)

  • Figure 1: Results of MCMC runs for the reconstruction of an Edős-Rényi network of $N=5000$ nodes and average degree $2E/N = 5$, and weights sampled from a normal distribution with mean $1/5$ and standard deviation $0.01$, serving as the couplings of a kinetic Ising model (see Appendix \ref{['app:models']}), based on $M=500$ parallel transitions from a random initial state. Panel (a) shows the cumulative recall of the typical set, i.e. the fraction of all entries with a posterior probability $\pi_{ij}$ above a particular value that have been found in $\hat{\mathcal{E}}$, for several values of the search period $\tau$. Panel (b) shows the Jaccard similarity $s(\bm{W}',\bm{W})$ between samples $\bm{W}'$ generated by the MCMC and the true value $\bm{W}$, with ($w_t=1$) and without ($w_t=0$) the estimation of the typical edge set, and various search periods $\tau$. Panel (c) shows the same kinds of MCMC runs, but with an initial state consisting of an empty network (the inset shows a zoom in the high similarity region). Panel (d) shows the autocorrelation function for the values of similarity of the runs in panel (b), discarding the initial transient before equilibration.
  • Figure 2: Illustration of the proposed "nearby" updates according to Eq. \ref{['eq:nearby']}. The black edges correspond to the nonzero entries of $\bm{W}$ at some point of the algorithm, and the green edges are entries with $Q_{n}(i,j|\bm{W},d)>0$ for $d=2$, which would be proposed for an update. Edges between the different components will never be proposed for any value of $d$.
  • Figure 3: Panel (b) shows the autocorrelation time as a function of the number of nodes $N$, for a target distribution according to Eq. \ref{['eq:target']}, with $\bm G$ generated as described in the text, with $E=5 N / 2$ edges, and considering different combinations of the move proposals, as indicated in the legend, in the situation where the typical network is connected ($p=0.9$) and where it is disconnected ($p=0.1$), in both cases with $\epsilon=10^{-8}$. The dashed line indicates a linear slope. Panel (a) shows an illustration of the connected and disconnected cases, with black edges representing those in $\bm G$ that are currently being sampled, and the dashed edges those in $\bm G$ that are not.
  • Figure 4: Reconstruction performance based on the dynamics generated by the kinetic Ising model (see Appendix \ref{['app:models']}) on two empirical networks, where the weights are sampled from a normal distribution with mean $1/{\left<k\right>}$ and standard deviation $0.01$, with ${\left<k\right>}=2E/N$ being the average degree. The left panels show the results for a network of American football teams girvan_community_2002 (with $N=115$ and $E=613$), and on the left for a network of friendship between high school students moody_peer_2001 (with $N=291$ and $E=1136$). The panels on the top show the similarity $s(\bm{W},\hat{\bm{W}})$ between the inferred and true networks, according to the MAP and MP estimators, as indicated in the legend, as a function of the lenght $M$ of the dynamics. The bottom panels show the number of edges of the inferred networks in each case. The dashed horizontal lines indicate the true value.
  • Figure 5: Reconstruction of a zero-added Ising model based on $M=619$ votes of $N=623$ deputies of the lower house of the Brazilian congress. (a) Marginal edge probabilities $\bm\pi$ indicated as edge thickness and the posterior mean $\widehat{\bm{W}}$ as edge colors. The node pie charts indicate the marginal group memberships, inferred according to the SBM incorporated in the reconstruction, as described in Ref. peixoto_network_2024. (b) MP estimate $\widehat{\bm{W}}$ according to Eq. \ref{['eq:mpe']}. (c) MAP point estimate $\bm{W}^{*}$ according to Eq. \ref{['eq:MAP']}. (d) Distribution of marginal posterior probability values $\pi_{ij}$ across all node pairs. (e) Posterior distribution of non-zero weight values $W_{ij}$ across all node pairs. (f) Distribution of node biases $\theta_{i}$ across all nodes $i$. In (e) and (f) the vertical lines correspond to the distribution obtained with the MAP point estimate.
  • ...and 7 more figures