Fishnets: Information-Optimal, Scalable Aggregation for Sets and Graphs

T. Lucas Makinen; Justin Alsing; Benjamin D. Wandelt

Fishnets: Information-Optimal, Scalable Aggregation for Sets and Graphs

T. Lucas Makinen, Justin Alsing, Benjamin D. Wandelt

TL;DR

Fishnets offer an information-theoretic aggregation framework for sets and graph neighborhoods by learning per-object score embeddings and inverse-Fisher weights. Through Twin Fisher-Score Networks, they aggregate per-datapoint information to form near-optimal dataset summaries, achieving information saturation and robustness under distribution shifts and censorship. Empirically, Fishnets deliver scalable Bayesian inference and drop-in GNN aggregation that matches or surpasses state-of-the-art performance with far fewer learned parameters and faster training, notably on ogbn-proteins. These results suggest a practical pathway to info-rich, scalable summaries for heterogeneous data in SBI and graph learning contexts.

Abstract

Set-based learning is an essential component of modern deep learning and network science. Graph Neural Networks (GNNs) and their edge-free counterparts Deepsets have proven remarkably useful on ragged and topologically challenging datasets. The key to learning informative embeddings for set members is a specified aggregation function, usually a sum, max, or mean. We propose Fishnets, an aggregation strategy for learning information-optimal embeddings for sets of data for both Bayesian inference and graph aggregation. We demonstrate that i) Fishnets neural summaries can be scaled optimally to an arbitrary number of data objects, ii) Fishnets aggregations are robust to changes in data distribution, unlike standard deepsets, iii) Fishnets saturate Bayesian information content and extend to regimes where MCMC techniques fail and iv) Fishnets can be used as a drop-in aggregation scheme within GNNs. We show that by adopting a Fishnets aggregation scheme for message passing, GNNs can achieve state-of-the-art performance versus architecture size on ogbn-protein data over existing benchmarks with a fraction of learnable parameters and faster training time.

Fishnets: Information-Optimal, Scalable Aggregation for Sets and Graphs

TL;DR

Abstract

Paper Structure (24 sections, 27 equations, 6 figures, 4 tables)

This paper contains 24 sections, 27 equations, 6 figures, 4 tables.

Introduction
Method: Optimal Aggregation of independent (heterogeneous) data
Fisher Information and Optimality Definitions
Set-like Data Likelihoods
Twin Fisher-Score Networks
Related Work
Experiments: Bayesian Information Saturation
Validation Case: Linear Regression
Robustness to changes in the underlying data distributions
Scalable Inference With Censorship and Nuisance Parameters
Graph Neural Network Aggregation
Drop-in replacement for Graph Benchmark Datasets
Focus Study on ogbn-proteins Benchmark
Modelling Uncertain Protein Associations.
Discussion & Future Work
...and 9 more sections

Figures (6)

Figure 1: Representative Test ROC-AUC curves for (a) benchmark and (b) noisy proteins datasets. Fishnets aggregation within GCNs clearly saturates information more quickly than GCNs and can also handle noisy edges and contextual information through explicit weight parameterization.
Figure 2: (a) Residual maximum likelihood estimates for slope (left) and intercept (right) scatter about the truth for linear regression test datasets of size $n_{\rm data} = 10^4$. The solid pink line is obtained from a weighted average of an ensemble of Fishnets networks, which were trained on datasets of size$n_{\rm data} = 500$. (b) Slices of true (dark) and network predicted (pink) score vector components as a function of data inputs for the $n_{\rm data}=10^4$ test set.
Figure 3: (b) Fishnets (pink) are robust to different noise distributions in test data (\ref{['fig:noisedists']}). Deepsets (grey) can return biased results for some parameters (left) and lossy estimates for others (right). Learned softmax aggregation appears to provide lossier and biased parameter estimates.
Figure 4: (a) Gamma population plate diagram. Circles represent random variables, boxes are deterministic quantities, and shaded variables are observed as data. The dashed line represents a possible censorship in measurement. Measurements of data $(t, s)_i$ are conducted until $n_{\rm data}$ samples are drawn. (b) The same Fishnets network can be used for inference on datasets much larger than those used in training. The twin Fishnet architecture was trained on $n_{\rm data}=500$. We then compress a target dataset and perform density estimation (green) and compare to an MCMC sampler as our true posterior (black dashed). Fishnets nearly saturates the information. We then use the same network to compress simulations of $n_{\rm data}=10^4$ to obtain the blue contours.
Figure 5: Density estimation posteriors obtained from parameter-Fishnets summary pairs are robust over training data. Each parameter's PIT test is close to uniform, which shows that the Fishnets summary posterior has successfully captured the underlying Bayesian information from the data.
...and 1 more figures

Fishnets: Information-Optimal, Scalable Aggregation for Sets and Graphs

TL;DR

Abstract

Fishnets: Information-Optimal, Scalable Aggregation for Sets and Graphs

Authors

TL;DR

Abstract

Table of Contents

Figures (6)