Graph Structure Learning with Interpretable Bayesian Neural Networks

Max Wasserman; Gonzalo Mateos

Graph Structure Learning with Interpretable Bayesian Neural Networks

Max Wasserman, Gonzalo Mateos

TL;DR

This framework enables GSL in modest-scale applications where uncertainty on the data structure is paramount and fast execution and parameter efficiency allow for high-fidelity posterior approximation via Markov Chain Monte Carlo (MCMC) and thus uncertainty quantification on edge predictions.

Abstract

Graphs serve as generic tools to encode the underlying relational structure of data. Often this graph is not given, and so the task of inferring it from nodal observations becomes important. Traditional approaches formulate a convex inverse problem with a smoothness promoting objective and rely on iterative methods to obtain a solution. In supervised settings where graph labels are available, one can unroll and truncate these iterations into a deep network that is trained end-to-end. Such a network is parameter efficient and inherits inductive bias from the optimization formulation, an appealing aspect for data constrained settings in, e.g., medicine, finance, and the natural sciences. But typically such settings care equally about uncertainty over edge predictions, not just point estimates. Here we introduce novel iterations with independently interpretable parameters, i.e., parameters whose values - independent of other parameters' settings - proportionally influence characteristics of the estimated graph, such as edge sparsity. After unrolling these iterations, prior knowledge over such graph characteristics shape prior distributions over these independently interpretable network parameters to yield a Bayesian neural network (BNN) capable of graph structure learning (GSL) from smooth signal observations. Fast execution and parameter efficiency allow for high-fidelity posterior approximation via Markov Chain Monte Carlo (MCMC) and thus uncertainty quantification on edge predictions. Synthetic and real data experiments corroborate this model's ability to provide well-calibrated estimates of uncertainty, in test cases that include unveiling economic sector modular structure from S$\&$P$500$ data and recovering pairwise digit similarities from MNIST images. Overall, this framework enables GSL in modest-scale applications where uncertainty on the data structure is paramount.

Graph Structure Learning with Interpretable Bayesian Neural Networks

TL;DR

Abstract

data and recovering pairwise digit similarities from MNIST images. Overall, this framework enables GSL in modest-scale applications where uncertainty on the data structure is paramount.

Paper Structure (28 sections, 14 equations, 15 figures, 3 tables, 2 algorithms)

This paper contains 28 sections, 14 equations, 15 figures, 3 tables, 2 algorithms.

Introduction
Towards interpretability and uncertainty quantification: Desiderata and contributions
Related Work
Model-based Formulation and Optimization Preliminaries
Graph structure learning from smooth signals
Optimization algorithms
Graph Structure Learning from Smooth Signals with Bayesian Neural Networks
Algorithm unrolling: Iterative optimization as a neural network blueprint
Stochastic model
Inference
Prediction
Bayesian Modeling of Unrolling-Based BNNs with Independent Interpretability
Prior modeling
Predictive checking
Experiments
...and 13 more sections

Figures (15)

Figure 1: Dual Proximal Gradient Descent
Figure 2: Bayesian workflow with independent interpretability. Inputs: A labeled data set $\mathcal{T}$, an inverse problem with independently interpretable parameter $\theta$ w.r.t. some characteristic of the solution, prior beliefs over this solution characteristic, and an unrolled NN which approximate solutions to the inverse problem. Bayesian Modeling: We use independent interpretability of $\theta$ to convert prior beliefs on solution characteristics to a prior distribution on the independently interpretable parameter. We use prior predictive checks to ensure priors generate data sets which encompass all plausible values of the solution characteristic, while still preferentially generating data sets which we believe are more likely apriori. If not, we can leverage independent interpretability to refine the prior. We then sample from the posterior, and use posterior predictive checks to provide a subjective validation of model fit.
Figure 3: Left: Prior modeling with independently interpretable parameter $\theta$. We run Algorithm \ref{['alg:dpg-iterates']} to convergence over discretized $\theta$. Top: Larger $\theta$ produces sparser graphs. Bottom: Larger $\theta$ produces smaller edge weight magnitudes. Right: Predictive Checks. The original prior generates very few data sets with densities of $\approx .9$, a value we feel is plausible. We can use the independent interpretability of $\theta$ w.r.t. edge density to alter its prior accordingly. A prior predictive check with this altered prior now encompasses these plausible data sets. The posterior predictive check ensures the replicated data sets - now sampled after conditioning on the training data - have similar edge densities to the observed training labels. Indeed, these edge densities, denoted as 'posterior', are tightly distributed around the average edge density of the labels.
Figure 4: Effective i.i.d. generalization. Both DPG and PDS are performant and well calibrated BNNs, although PDS requires $\approx 3 \times$ more time for inference. Further performance gains are found with the expanded, partially stochastic DPG-MIMO-E model. The rightmost plot (reliability diagram) indicates high calibration across confidence levels; the left-most bin is least calibrated but contains $<0.4\%$ of edges across all models. This plot shows experiments on $N=20$ RG$_{\frac{1}{3}}$ graphs. Error bars are scaled by $.05$ for compact visual effect.
Figure 5: Qualitative i.i.d. generalization. Left: For a random test sample we show the label $\tilde{{\bm{a}}}$, and estimated mean (pred. mean) and standard deviation (pred. stdv) of the edge-wise marginal posterior predictive $p(\tilde{a_i} \mid \tilde{{\bm{e}}},\mathcal{T})$. Comparing pred. mean to the label $\tilde{{\bm{a}}}$ adds qualitative evidence that the model is well fit to the data. Right: The edge-wise uncertainty estimate (pred. stdv.) and error $|\tilde{a}_i - \mathbb{E}_{\boldsymbol{\Theta}|\mathcal{T}}[\tilde{a}_i|\tilde{{\bm{e}}}, \mathcal{T}]|$ have a strong positive Pearson correlation $\rho=0.91$.
...and 10 more figures

Graph Structure Learning with Interpretable Bayesian Neural Networks

TL;DR

Abstract

Graph Structure Learning with Interpretable Bayesian Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (15)