Learning to Approximate Adaptive Kernel Convolution on Graphs

Jaeyoon Sim; Sooyeon Jeon; InJun Choi; Guorong Wu; Won Hwa Kim

Learning to Approximate Adaptive Kernel Convolution on Graphs

Jaeyoon Sim, Sooyeon Jeon, InJun Choi, Guorong Wu, Won Hwa Kim

TL;DR

This work tackles oversmoothing in graph neural networks by introducing LSAP, which learns per-node diffusion scales to adapt the aggregation range using a heat-kernel form. By expressing the kernel via polynomial approximations and deriving closed-form derivatives of the coefficients with respect to the scale, LSAP enables end-to-end training without expensive Laplacian diagonalization. Empirically, LSAP achieves state-of-the-art or competitive results on standard node-classification benchmarks and excels on graph-classification tasks involving brain networks for Alzheimer's disease, while also providing interpretable, ROI-level diffusion ranges. The approach offers a scalable, interpretable alternative to global diffusion schemes with practical impact for large graphs and neuroimaging applications.

Abstract

Various Graph Neural Networks (GNNs) have been successful in analyzing data in non-Euclidean spaces, however, they have limitations such as oversmoothing, i.e., information becomes excessively averaged as the number of hidden layers increases. The issue stems from the intrinsic formulation of conventional graph convolution where the nodal features are aggregated from a direct neighborhood per layer across the entire nodes in the graph. As setting different number of hidden layers per node is infeasible, recent works leverage a diffusion kernel to redefine the graph structure and incorporate information from farther nodes. Unfortunately, such approaches suffer from heavy diagonalization of a graph Laplacian or learning a large transform matrix. In this regards, we propose a diffusion learning framework, where the range of feature aggregation is controlled by the scale of a diffusion kernel. For efficient computation, we derive closed-form derivatives of approximations of the graph convolution with respect to the scale, so that node-wise range can be adaptively learned. With a downstream classifier, the entire framework is made trainable in an end-to-end manner. Our model is tested on various standard datasets for node-wise classification for the state-of-the-art performance, and it is also validated on a real-world brain network data for graph classifications to demonstrate its practicality for Alzheimer classification.

Learning to Approximate Adaptive Kernel Convolution on Graphs

TL;DR

Abstract

Paper Structure (38 sections, 6 theorems, 41 equations, 7 figures, 9 tables)

This paper contains 38 sections, 6 theorems, 41 equations, 7 figures, 9 tables.

Introduction
Related Works
Preliminaries
Learning to Approximate Kernel Convolution
Model Architecture
Convolution Layer.
Output Layer.
Model Update.
Gradients of Polynomial Coefficients with Scale
Chebyshev Polynomial.
Hermite Polynomial.
Laguerre Polynomial.
Semi-supervised Node Classification
Graph Classification
Experiments
...and 23 more sections

Key Result

Lemma 1

Consider an orthogonal polynomial $P_n$ over interval $[a,b]$ with inner product $\int_a^b P_n(\lambda)P_k(\lambda)w(\lambda)d\lambda=\delta_{nk}$, where $w(\lambda)$ is the weight function. If $P_n$ expands the heat kernel, the expansion coefficients $c_{\textbf{s},n}$ with respect to $\textbf{s}$

Figures (7)

Figure 1: Illustration of LSAP. A graph (as normalized Laplacian $\hat{L}$) and node feature $X$ are inputted to the convolution layer. The output $H_K$ is inputted to a downstream classifier which yields a prediction $\hat{Y}$. The loss from $\hat{Y}$ is backpropagated to update the classifier and convolution approximation with $\textbf{s} = [s_1,\dots,s_N]$ to adaptively adjust the scale of each node.
Figure 2: Comparisons of computation time (in ms) for one epoch (Forward and backpropagation). Within the epoch, time for heat kernel convolution is given in black bar. Results were obtained with 10 repetitions.
Figure 3: Visualization of the learned scales on the cortical regions of a brain. This visualization shows the scale of each ROI from the classification result using FDG feature. Top: Inner part of right hemisphere, Bottom: Outer part of right hemisphere.
Figure 4: Effect of the number of layers $K$ on model performance. Left: accuracy of node classification on Cora, Right: accuracy of graph classification on ADNI.
Figure 5: Visualization of learned scales on the cortical and sub-cortical regions of a brain. This visualization shows the scale of each RoI through the classification result using Cortical Thickness feature.
...and 2 more figures

Theorems & Definitions (9)

Lemma 1
Lemma 2
Lemma 3
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof

Learning to Approximate Adaptive Kernel Convolution on Graphs

TL;DR

Abstract

Learning to Approximate Adaptive Kernel Convolution on Graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (9)