Table of Contents
Fetching ...

Valid Conformal Prediction for Dynamic GNNs

Ed Davis, Ian Gallagher, Daniel John Lawson, Patrick Rubin-Delanchy

TL;DR

This paper addresses uncertainty quantification on dynamic graphs by introducing unfolded GNNs that input a dilated unfolding of the dynamic adjacency to standard GNNs, enabling provably valid conformal prediction sets. By operating with a time-exchangeable embedding and a split-conformal framework, validity is achieved in transductive regimes without strong assumptions, and in semi-inductive regimes under mild exchangeability and symmetry conditions. Empirical results on synthetic SBM and real datasets ( SBM, School, Flight, Trade ) show that unfolded GNNs yield higher accuracy and smaller conformal sets in many regimes, with drift in some real-world series (e.g., Trade) indicating limits of exchangeability and prompting future enhancements. The approach remains modular, requiring no changes to existing GNN or CP routines, and points to promising extensions in inductive inference and drift-robust conformal strategies.

Abstract

Dynamic graphs provide a flexible data abstraction for modelling many sorts of real-world systems, such as transport, trade, and social networks. Graph neural networks (GNNs) are powerful tools allowing for different kinds of prediction and inference on these systems, but getting a handle on uncertainty, especially in dynamic settings, is a challenging problem. In this work we propose to use a dynamic graph representation known in the tensor literature as the unfolding, to achieve valid prediction sets via conformal prediction. This representation, a simple graph, can be input to any standard GNN and does not require any modification to existing GNN architectures or conformal prediction routines. One of our key contributions is a careful mathematical consideration of the different inference scenarios which can arise in a dynamic graph modelling context. For a range of practically relevant cases, we obtain valid prediction sets with almost no assumptions, even dispensing with exchangeability. In a more challenging scenario, which we call the semi-inductive regime, we achieve valid prediction under stronger assumptions, akin to stationarity. We provide real data examples demonstrating validity, showing improved accuracy over baselines, and sign-posting different failure modes which can occur when those assumptions are violated.

Valid Conformal Prediction for Dynamic GNNs

TL;DR

This paper addresses uncertainty quantification on dynamic graphs by introducing unfolded GNNs that input a dilated unfolding of the dynamic adjacency to standard GNNs, enabling provably valid conformal prediction sets. By operating with a time-exchangeable embedding and a split-conformal framework, validity is achieved in transductive regimes without strong assumptions, and in semi-inductive regimes under mild exchangeability and symmetry conditions. Empirical results on synthetic SBM and real datasets ( SBM, School, Flight, Trade ) show that unfolded GNNs yield higher accuracy and smaller conformal sets in many regimes, with drift in some real-world series (e.g., Trade) indicating limits of exchangeability and prompting future enhancements. The approach remains modular, requiring no changes to existing GNN or CP routines, and points to promising extensions in inductive inference and drift-robust conformal strategies.

Abstract

Dynamic graphs provide a flexible data abstraction for modelling many sorts of real-world systems, such as transport, trade, and social networks. Graph neural networks (GNNs) are powerful tools allowing for different kinds of prediction and inference on these systems, but getting a handle on uncertainty, especially in dynamic settings, is a challenging problem. In this work we propose to use a dynamic graph representation known in the tensor literature as the unfolding, to achieve valid prediction sets via conformal prediction. This representation, a simple graph, can be input to any standard GNN and does not require any modification to existing GNN architectures or conformal prediction routines. One of our key contributions is a careful mathematical consideration of the different inference scenarios which can arise in a dynamic graph modelling context. For a range of practically relevant cases, we obtain valid prediction sets with almost no assumptions, even dispensing with exchangeability. In a more challenging scenario, which we call the semi-inductive regime, we achieve valid prediction under stronger assumptions, akin to stationarity. We provide real data examples demonstrating validity, showing improved accuracy over baselines, and sign-posting different failure modes which can occur when those assumptions are violated.
Paper Structure (21 sections, 4 theorems, 21 equations, 6 figures, 12 tables, 3 algorithms)

This paper contains 21 sections, 4 theorems, 21 equations, 6 figures, 12 tables, 3 algorithms.

Key Result

Lemma 1

In the transductive regime, the prediction set output by Algorithm alg:split_conformal is valid, that is,

Figures (6)

  • Figure 1: Contribution overview. a) This paper is about the representation of the collection of adjacency matrix snapshots. The baseline (current practice) approach treats these as independent and can be viewed as padding a 'block-diagonal' matrix with zeroes. Unfolding instead column concatenates which links nodes to themselves over time. Dilation results in a square symmetric matrix. b) Which data are available at training time affects performance; we report results for transductive (all time-points are exchangeable in terms of test/train split), semi-inductive (a future period is reserved for testing), or temporal transductive (a future period is reserved for testing and calibration). c) Simulation of an i.i.d. stochastic block model showing the embedding after applying PCA. The models were trained with transductive masks. Block diagonal GCN appears to encode a significant change over time despite there being none. The embedding from UGCN is exchangeable over time, as would be expected.
  • Figure 2: Numbers of edges over time. The School and Flight data show rough periodic/seasonal structure, which the Trade data features drift with the number of edges growing smoothly with time.
  • Figure 3: Performance metrics for each time window of the school dataset for unfolded GAT and block diagonal GAT. The prediction task gets more difficult at lunchtime, as shown by the drop in accuracy of both methods in the transductive case. UGAT has marginally better performance in the transductive case and significantly better performance in the semi-inductive case. Prediction set sizes increase at lunchtime, with only UGAT set sizes reacting in the semi-inductive case. Both methods maintain target coverage in the transductive case, with uncertainty increasing at the more difficult lunchtime window. UGAT also maintains target coverage in the semi-inductive case, while block GAT under-covers.
  • Figure 4: Prediction accuracy for each time window of the school dataset for unfolded GCN and block diagonal GCN. The prediction task gets more difficult at lunchtime, as shown by the drop in accuracy of both methods in the transductive case. UGCN has marginally better performance in the transductive case and significantly better performance in the semi-inductive case.
  • Figure 5: Coverage for each time window of the school dataset for unfolded GCN and block diagonal GCN. Both methods maintain target coverage on average in the transductive case, but not at every point in time. In the semi-inductive case, block GCN under-covers continuously and UGCN under-covers for the first test point.
  • ...and 1 more figures

Theorems & Definitions (6)

  • Lemma 1
  • Theorem 1
  • Lemma 2
  • proof
  • Theorem 2
  • proof