Table of Contents
Fetching ...

Graph neural networks and non-commuting operators

Mauricio Velasco, Kaiying O'Hare, Bernardo Rychtenberg, Soledad Villar

TL;DR

A limit theory of graphon-tuple neural networks is developed and used to prove a universal transferability theorem that guarantees that all graph-tuple neural networks are transferable on convergent graph-tuple sequences.

Abstract

Graph neural networks (GNNs) provide state-of-the-art results in a wide variety of tasks which typically involve predicting features at the vertices of a graph. They are built from layers of graph convolutions which serve as a powerful inductive bias for describing the flow of information among the vertices. Often, more than one data modality is available. This work considers a setting in which several graphs have the same vertex set and a common vertex-level learning task. This generalizes standard GNN models to GNNs with several graph operators that do not commute. We may call this model graph-tuple neural networks (GtNN). In this work, we develop the mathematical theory to address the stability and transferability of GtNNs using properties of non-commuting non-expansive operators. We develop a limit theory of graphon-tuple neural networks and use it to prove a universal transferability theorem that guarantees that all graph-tuple neural networks are transferable on convergent graph-tuple sequences. In particular, there is no non-transferable energy under the convergence we consider here. Our theoretical results extend well-known transferability theorems for GNNs to the case of several simultaneous graphs (GtNNs) and provide a strict improvement on what is currently known even in the GNN case. We illustrate our theoretical results with simple experiments on synthetic and real-world data. To this end, we derive a training procedure that provably enforces the stability of the resulting model.

Graph neural networks and non-commuting operators

TL;DR

A limit theory of graphon-tuple neural networks is developed and used to prove a universal transferability theorem that guarantees that all graph-tuple neural networks are transferable on convergent graph-tuple sequences.

Abstract

Graph neural networks (GNNs) provide state-of-the-art results in a wide variety of tasks which typically involve predicting features at the vertices of a graph. They are built from layers of graph convolutions which serve as a powerful inductive bias for describing the flow of information among the vertices. Often, more than one data modality is available. This work considers a setting in which several graphs have the same vertex set and a common vertex-level learning task. This generalizes standard GNN models to GNNs with several graph operators that do not commute. We may call this model graph-tuple neural networks (GtNN). In this work, we develop the mathematical theory to address the stability and transferability of GtNNs using properties of non-commuting non-expansive operators. We develop a limit theory of graphon-tuple neural networks and use it to prove a universal transferability theorem that guarantees that all graph-tuple neural networks are transferable on convergent graph-tuple sequences. In particular, there is no non-transferable energy under the convergence we consider here. Our theoretical results extend well-known transferability theorems for GNNs to the case of several simultaneous graphs (GtNNs) and provide a strict improvement on what is currently known even in the GNN case. We illustrate our theoretical results with simple experiments on synthetic and real-world data. To this end, we derive a training procedure that provably enforces the stability of the resulting model.

Paper Structure

This paper contains 20 sections, 14 theorems, 62 equations, 2 figures.

Key Result

Theorem 1

Suppose $\vec{W}$ and $\vec{Z}$ are two nonexpansive operator $k$-tuples. For positive integers $A,B$ let $H$ be any $B\times A$ matrix with entries in $\mathbb{R}\langle X_1,\dots, X_k\rangle$. The operator-tuple neural layer with ReLu activation defined by $H$ satisfies the following perturbation

Figures (2)

  • Figure 1: We assess the tightness of our theoretical results on a regression problem on a synthetic data toy example consisting of two weighted circulant graphs. See Appendix \ref{['app.experiment details stability']} for details. (Left) Numerical stability bound $C(h)$ (dashed) and stability metrics $\|h(\vec{T})\|_{\rm op}$ (solid) with respect to input signal perturbation as a function of the number of epochs for both the standard (1-layer) GtNN (orange) and (1-layer) stable GtNN (blue). (Middle) Similar plot for the stability metrics with respect to the graph perturbation $\|h(\vec{W})-h(\vec{Z})\|_{\rm op}$ and its upper bound (Lemma \ref{['lem: operator_nonexpansive']} part 2 and 3b). For this plot we take $\vec{W} = \vec{T}$, and $\vec{Z}$ is a random perturbation from $\vec{T}$ with $\|Z_1 - W_1\|_{\rm op} \approx \|Z_2 - W_2\|_{\rm op} \approx 0.33$. (Right) For all four models, compute the 2-norm of the vector of output perturbations from Equation \ref{['eq.perturbation']} over the test set for various sizes of graph perturbation $(\|T_1 - W_1\|_{\rm op} + \|T_2 - W_2\|_{\rm op})/2$, where the additive graph perturbation $T_1 - W_1$ and $T_2 - W_2$ are symmetric matrices with iid Gaussian entries. In addition, each $T_j$ and $W_j$ are normalized such that $\|T_j\|_{\rm op} \leq 1$ and $\|W_j\|_{\rm op} \leq 1$ for $j=1,2$, so they are nonexpansive operator-tuple networks. (All) We observe that adding stability constraints does not affect the prediction performance: the testing R squared value for GtNN is $0.6866$, while for stable GtNN is $0.6543$.
  • Figure 3: Mean squared error (MSE) on the test set (with testing graph of size $n=300$) as a function of the number of training epochs for (Left) (1-layer) GtNN and (Middle) (1-layer) stable GtNN. In both plots we depict the performance of five different models, trained with graphs of sizes $m = 100, 150, 200, 250, 300$ respectively. (Right) Comparison of testing MSE between (1-layer) GtNN (blue) and (1-layer) stable GtNN (orange) for training graphs of size $m=100$ as a function of the number of epochs.

Theorems & Definitions (31)

  • Theorem 1
  • Corollary 2
  • Corollary 3
  • Theorem 4
  • Theorem 5
  • Example 6
  • Theorem 7
  • Theorem 8
  • Lemma 9
  • proof
  • ...and 21 more