Derivation of Back-propagation for Graph Convolutional Networks using Matrix Calculus and its Application to Explainable Artificial Intelligence

Yen-Che Hsiao; Rongting Yue; Abhishek Dutta

Derivation of Back-propagation for Graph Convolutional Networks using Matrix Calculus and its Application to Explainable Artificial Intelligence

Yen-Che Hsiao, Rongting Yue, Abhishek Dutta

TL;DR

This paper derives an analytic, closed-form back-propagation gradient for Graph Convolutional Networks using matrix calculus, extending to arbitrary depths and arbitrary element-wise activation functions. It targets two canonical graph tasks: node classification and link prediction, and validates the gradient by comparing against reverse-mode automatic differentiation, showing median SSE in the range $10^{-18}$ to $10^{-14}$. The authors provide explicit matrix-based expressions leveraging Kronecker, Hadamard, and permutation matrices, and they extend the framework to sensitivity analysis for explainable AI. While the method incurs higher computational cost than AD, it provides exact gradient expressions and a foundation for interpretable gradient-based optimization in GCNs.

Abstract

This paper provides a comprehensive and detailed derivation of the backpropagation algorithm for graph convolutional neural networks using matrix calculus. The derivation is extended to include arbitrary element-wise activation functions and an arbitrary number of layers. The study addresses two fundamental problems, namely node classification and link prediction. To validate our method, we compare it with reverse-mode automatic differentiation. The experimental results demonstrate that the median sum of squared errors of the updated weight matrices, when comparing our method to the approach using reverse-mode automatic differentiation, falls within the range of $10^{-18}$ to $10^{-14}$. These outcomes are obtained from conducting experiments on a five-layer graph convolutional network, applied to a node classification problem on Zachary's karate club social network and a link prediction problem on a drug-drug interaction network. Finally, we show how the derived closed-form solution can facilitate the development of explainable AI and sensitivity analysis.

Derivation of Back-propagation for Graph Convolutional Networks using Matrix Calculus and its Application to Explainable Artificial Intelligence

TL;DR

. The authors provide explicit matrix-based expressions leveraging Kronecker, Hadamard, and permutation matrices, and they extend the framework to sensitivity analysis for explainable AI. While the method incurs higher computational cost than AD, it provides exact gradient expressions and a foundation for interpretable gradient-based optimization in GCNs.

Abstract

. These outcomes are obtained from conducting experiments on a five-layer graph convolutional network, applied to a node classification problem on Zachary's karate club social network and a link prediction problem on a drug-drug interaction network. Finally, we show how the derived closed-form solution can facilitate the development of explainable AI and sensitivity analysis.

Paper Structure (28 sections, 1 theorem, 67 equations, 10 figures)

This paper contains 28 sections, 1 theorem, 67 equations, 10 figures.

Introduction
Back-propagation of Graph Convolutional Network
Binary classification of nodes
Backpropagation for 3-layer GCN with ReLU and sigmoid activation function
Back-propagation for multi-layer GCN with ReLU and sigmoid activation function
Back-propagation for multi-layer GCN with arbitrary activation functions
Link prediction
Experiments
Node classification
1-layer GCN with identity function and sigmoid activation function
Link prediction
2-layer GCN
Conclusion
Basic notation and properties of Kronecker product and matrix calculus
Basic notation
...and 13 more sections

Key Result

Theorem 2.3

Let $\mathbf{F}: \mathbb{R}^{p \times q} \rightarrow \mathbb{R}^{m \times n}$ be a $m\times n$ multivariate matrix-valued function of a $p\times q$ matrix $\mathbf{W}\in\mathbb{R}^{p \times q}$, the derivative of $\mathbf{\Sigma}(\mathbf{F}(\mathbf{W}))$ with respect to $\mathbf{W}$ can be written a where $\otimes$ is the Kronecker product in (Kronecker), $\odot$ is the Hadamard product in (Hadama

Figures (10)

Figure 1: (a) The evolution of the sum of squared error between the trainable weight matrix obtained from our method and the matrix obtained using reverse mode automatic differentiation in section \ref{['Node1GCN']}. (b) The evolution of the sum of squared error between the two trainable weight matrices obtained from our method and the matrices obtained using reverse mode automatic differentiation in section \ref{['Link1GCN']}.
Figure 2: (a) The evolution of the absolute sum of the sensitivity of the loss with respect to the input feature matrix $\mathbf{H}_{0}$ in Section \ref{['Node1GCN']}. (b) The heat map of the sensitivity of the prediction for the link between node 2 and node 7 with respect to the input feature matrix $\mathbf{H}_{0}$ in Section \ref{['Link1GCN']}.
Figure 3: (a) Zachary’s karate club social network zachary1977information. The node colors signify classes, with blue representing class 0 and orange representing class 1. (b) Drug-drug interaction network. Black links between each node represent graph edges.
Figure 4: Evolution of karate club network node classification obtained from a 1-layer GCN model after $100$ training iterations. The training loss, accuracy, and classification results exhibit similar trends when using either reverse-mode automatic differentiation or our matrix-based method. Nodes are represented by circle with the number to distinguish each node. Colors represent classes, where blue represents class 0 and orange represents class 1. Black links between each node represent graph edges.
Figure 5: (a) The line plot of the 1060 evolution of the sum of squared error (SSE) between the trainable weight matrices obtained from our method and the matrix obtained using reverse mode automatic differentiation in section \ref{['Node5GCN']}. (b) The box plot of the 1060 evolution of the SSE between the trainable weight matrices obtained from our method and the matrix obtained using reverse mode automatic differentiation in section \ref{['Node5GCN']}.
...and 5 more figures

Theorems & Definitions (4)

Definition 2.1
Definition 2.2
Theorem 2.3
proof

Derivation of Back-propagation for Graph Convolutional Networks using Matrix Calculus and its Application to Explainable Artificial Intelligence

TL;DR

Abstract

Derivation of Back-propagation for Graph Convolutional Networks using Matrix Calculus and its Application to Explainable Artificial Intelligence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (4)