Table of Contents
Fetching ...

Learning Latent Graph Structures and their Uncertainty

Alessandro Manenti, Daniele Zambon, Cesare Alippi

TL;DR

It is demonstrated that minimizing point-prediction losses does not guarantee proper learning of the latent relational information and its associated uncertainty, and it is proved that suitable loss functions on the stochastic model outputs simultaneously grant solving two tasks: learning the unknown distribution of the latent graph and achieving optimal predictions of the target variable.

Abstract

Graph neural networks use relational information as an inductive bias to enhance prediction performance. Not rarely, task-relevant relations are unknown and graph structure learning approaches have been proposed to learn them from data. Given their latent nature, no graph observations are available to provide a direct training signal to the learnable relations. Therefore, graph topologies are typically learned on the prediction task alongside the other graph neural network parameters. In this paper, we demonstrate that minimizing point-prediction losses does not guarantee proper learning of the latent relational information and its associated uncertainty. Conversely, we prove that suitable loss functions on the stochastic model outputs simultaneously grant solving two tasks: (i) learning the unknown distribution of the latent graph and (ii) achieving optimal predictions of the target variable. Finally, we propose a sampling-based method that solves this joint learning task. Empirical results validate our theoretical claims and demonstrate the effectiveness of the proposed approach.

Learning Latent Graph Structures and their Uncertainty

TL;DR

It is demonstrated that minimizing point-prediction losses does not guarantee proper learning of the latent relational information and its associated uncertainty, and it is proved that suitable loss functions on the stochastic model outputs simultaneously grant solving two tasks: learning the unknown distribution of the latent graph and achieving optimal predictions of the target variable.

Abstract

Graph neural networks use relational information as an inductive bias to enhance prediction performance. Not rarely, task-relevant relations are unknown and graph structure learning approaches have been proposed to learn them from data. Given their latent nature, no graph observations are available to provide a direct training signal to the learnable relations. Therefore, graph topologies are typically learned on the prediction task alongside the other graph neural network parameters. In this paper, we demonstrate that minimizing point-prediction losses does not guarantee proper learning of the latent relational information and its associated uncertainty. Conversely, we prove that suitable loss functions on the stochastic model outputs simultaneously grant solving two tasks: (i) learning the unknown distribution of the latent graph and (ii) achieving optimal predictions of the target variable. Finally, we propose a sampling-based method that solves this joint learning task. Empirical results validate our theoretical claims and demonstrate the effectiveness of the proposed approach.
Paper Structure (45 sections, 5 theorems, 37 equations, 15 figures, 4 tables)

This paper contains 45 sections, 5 theorems, 37 equations, 15 figures, 4 tables.

Key Result

Proposition 4.1

Consider Assumption a:inclusion=in-family. Loss function $\mathcal{L}^{point}(\theta,\psi)$ in eq:T-bayes-rules is minimized by all $\theta$ and $\psi$ s.t. $T[P_{y|x}^{\theta,\psi}] = T[P^*_{y|x}]$ almost surely on $x$ and, in particular,

Figures (15)

  • Figure 1: Adjacency matrices sampled from $P_A^*=P_A^{\theta^*}$ for the experiment of Section \ref{['sec: Experiments']} are subgraphs of the top graph; in this picture, 3 communities of an arbitrarily large graph are shown. Each edge in orange is independently sampled with probability $\theta^*_{ij}$; parameters $\theta^*_{ij}$ defining the edge probabilities are represented at the bottom for a two communities graph.
  • Figure 2: Validation losses $\mathcal{L}^{dist}$, $\mathcal{L}^{cal}$ and $\mathcal{L}^{point}$ during training. At epoch 5, the learning rate is decreased to ensure convergence. $\mathcal{L}^{dist}$ in Subfigure \ref{['fig: MMD prediction loss']} is negative as the third term in \ref{['eq: MMD^2 monte carlo friendly definition']} is constant and not considered.
  • Figure 3: The adjacency matrices used in this paper are sampled from this graph. Each edge in orange is independently sampled with probability $\theta^*$. In the picture, 3 communities of an arbitrarily large graph are shown.
  • Figure 4: $\theta^*_{ij}$ parameters for each edge of the latent adjacency matrix. Each square corresponds to an edge, and the number inside is the probability of sampling that edge for each prediction.
  • Figure 5: The learned parameters for the latent distribution corresponding to the stochastic adjacency matrix.
  • ...and 10 more figures

Theorems & Definitions (10)

  • Proposition 4.1
  • Theorem 5.2
  • proof : Proof of Proposition \ref{["th:optimal prediction don't guarantee calibration"]}
  • proof : Proof of Theorem \ref{['theo:Ldist-Lpoint-Lcal']}
  • Corollary 1.1
  • proof
  • Proposition 1.2
  • Lemma 1.3
  • proof
  • proof : Proof of Proposition \ref{['prop: l dist for calibration pt 2']}