Table of Contents
Fetching ...

Conformal Prediction for Federated Graph Neural Networks with Missing Neighbor Information

Ömer Faruk Akgül, Rajgopal Kannan, Viktor Prasanna

TL;DR

This study extends the applicability of Conformal Prediction, a well-established method for uncertainty quantification, to federated graph learning and introduces a Variational Autoencoder-based approach for reconstructing missing neighbors to mitigate the negative impact of missing data.

Abstract

Graphs play a crucial role in data mining and machine learning, representing real-world objects and interactions. As graph datasets grow, managing large, decentralized subgraphs becomes essential, particularly within federated learning frameworks. These frameworks face significant challenges, including missing neighbor information, which can compromise model reliability in safety-critical settings. Deployment of federated learning models trained in such settings necessitates quantifying the uncertainty of the models. This study extends the applicability of Conformal Prediction (CP), a well-established method for uncertainty quantification, to federated graph learning. We specifically tackle the missing links issue in distributed subgraphs to minimize its adverse effects on CP set sizes. We discuss data dependencies across the distributed subgraphs and establish conditions for CP validity and precise test-time coverage. We introduce a Variational Autoencoder-based approach for reconstructing missing neighbors to mitigate the negative impact of missing data. Empirical evaluations on real-world datasets demonstrate the efficacy of our approach, yielding smaller prediction sets while ensuring coverage guarantees.

Conformal Prediction for Federated Graph Neural Networks with Missing Neighbor Information

TL;DR

This study extends the applicability of Conformal Prediction, a well-established method for uncertainty quantification, to federated graph learning and introduces a Variational Autoencoder-based approach for reconstructing missing neighbors to mitigate the negative impact of missing data.

Abstract

Graphs play a crucial role in data mining and machine learning, representing real-world objects and interactions. As graph datasets grow, managing large, decentralized subgraphs becomes essential, particularly within federated learning frameworks. These frameworks face significant challenges, including missing neighbor information, which can compromise model reliability in safety-critical settings. Deployment of federated learning models trained in such settings necessitates quantifying the uncertainty of the models. This study extends the applicability of Conformal Prediction (CP), a well-established method for uncertainty quantification, to federated graph learning. We specifically tackle the missing links issue in distributed subgraphs to minimize its adverse effects on CP set sizes. We discuss data dependencies across the distributed subgraphs and establish conditions for CP validity and precise test-time coverage. We introduce a Variational Autoencoder-based approach for reconstructing missing neighbors to mitigate the negative impact of missing data. Empirical evaluations on real-world datasets demonstrate the efficacy of our approach, yielding smaller prediction sets while ensuring coverage guarantees.

Paper Structure

This paper contains 36 sections, 2 theorems, 32 equations, 7 figures, 5 tables, 1 algorithm.

Key Result

Lemma 1

Within the transductive learning setting, assuming permutation invariance in graph learning over the unordered graph $\mathcal{G}^k = (\mathcal{V}^k, \mathcal{E}^k)$, the set of non-conformity scores $\{s_v\}_{v \in \mathcal{V}^k_{\text{calib}} \cup \mathcal{V}^k_{\text{test}}}$ is invariant under p

Figures (7)

  • Figure 1: Overview of federated conformal prediction for graph-structured data. A scenario involving patient data shared across three hospitals. It distinguishes between intra-client (solid lines) and inter-client (dashed lines) interactions, with the former stored in hospital databases and the latter often missing in federated settings despite their real-life presence. The FedGNN model leverages a federated GNN to optimize a global model through local client updates. In contrast, the conventional GNN model is trained under an ideal scenario where all connections (both solid and dashed) are accessible, providing a benchmark for comparison. The figure also highlights how missing inter-client links contribute to inefficiencies in the conformal prediction set size, as demonstrated in the prediction sets $C^1_{\alpha}(X_{\text{test}})$ and $C^2_{\alpha}(X_{\text{test}})$.
  • Figure 2: Effect of the number of clients on CP set size for the Cora dataset.
  • Figure 3: Missing neighbor generation framework. Clients use VAEs to generate node features, apply K-means clustering for prototype selection, and share these with a central server. The server redistributes prototypes to enrich client subgraphs, aiding in link prediction with a global VGAE model.
  • Figure 4: Coverage Rates: Coverage rates for Fed (left) and Gen (right) models across varying $K$ on the Cora dataset.
  • Figure 5: Heatmap showing RAPS non-conformity scores for Fed and Gen methods across various $\epsilon$-values and $1-\alpha$ values on 3 client Cora dataset.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Lemma 1
  • Definition 1: Partial Exchangeability
  • Theorem 1