Unitary convolutions for learning on graphs and groups

Bobak T. Kiani; Lukas Fesser; Melanie Weber

Unitary convolutions for learning on graphs and groups

Bobak T. Kiani, Lukas Fesser, Melanie Weber

TL;DR

This work introduces unitary group convolutions to address instability and over-smoothing in deep group-convolutional neural networks, with a focus on graphs. It proposes two concrete operators, UniConv and Lie UniConv, built via the exponential map to ensure unitary, norm-preserving transformations that are invertible and equivariant, and extends the approach to generalized group convolutions. The authors provide theoretical guarantees—invertibility, isometry, equivariance, Rayleigh-quotient invariance, and dynamical isometry—and demonstrate empirically that unitary GCNs achieve competitive or superior performance on long-range and heterophilous graph tasks while enabling deeper architectures. The results suggest that unitary, norm-preserving convolutions improve stability and learning of long-range dependencies, with potential applicability to broader group symmetries and robustness considerations in geometric ML.

Abstract

Data with geometric structure is ubiquitous in machine learning often arising from fundamental symmetries in a domain, such as permutation-invariance in graphs and translation-invariance in images. Group-convolutional architectures, which encode symmetries as inductive bias, have shown great success in applications, but can suffer from instabilities as their depth increases and often struggle to learn long range dependencies in data. For instance, graph neural networks experience instability due to the convergence of node representations (over-smoothing), which can occur after only a few iterations of message-passing, reducing their effectiveness in downstream tasks. Here, we propose and study unitary group convolutions, which allow for deeper networks that are more stable during training. The main focus of the paper are graph neural networks, where we show that unitary graph convolutions provably avoid over-smoothing. Our experimental results confirm that unitary graph convolutional networks achieve competitive performance on benchmark datasets compared to state-of-the-art graph neural networks. We complement our analysis of the graph domain with the study of general unitary convolutions and analyze their role in enhancing stability in general group convolutional architectures.

Unitary convolutions for learning on graphs and groups

TL;DR

Abstract

Paper Structure (52 sections, 7 theorems, 68 equations, 7 figures, 8 tables, 3 algorithms)

This paper contains 52 sections, 7 theorems, 68 equations, 7 figures, 8 tables, 3 algorithms.

Introduction
Related work
Background and Notation
Group Theory Basics
Graph Neural Networks
Group-Convolutional Neural Networks
Unitary Group Convolutions
Unitary graph convolution
Implementing the exponential map
Generalized unitary convolutions
Properties and theoretical guarantees
Oversmoothing
Vanishing/Exploding gradients
Experiments
Toy model: graph distance
...and 37 more sections

Key Result

Proposition 3

Let $f_{{\operatorname{conv}}}:\mathbb{R}^{n \times d} \to \mathbb{R}^{n \times d}$ be a graph convolution layer of the form where ${\bm{W}}_0, {\bm{W}}_1 \in \mathbb{R}^{d \times d}$ are parameterized matrices. The linear map $f(\cdot, {\bm{A}}):\mathbb{R}^{n \times d} \to \mathbb{R}^{n \times d}$ is orthogonal for all adjacency matrices ${\bm{A}}$ of undirected graphs only if ${\bm{W}}_1= \bm 0

Figures (7)

Figure 1: Comparison of standard linear message passing with iterates ${\bm{x}}_{L+1}=c({\bm{x}}_L + {\bm{A}} {\bm{x}}_L)$ versus unitary message passing with iterates ${\bm{x}}_{L+1}=\exp(i{\bm{A}}){\bm{x}}_L$ for a graph of $80$ nodes connected as a ring. The unitary message passing has a wave-like nature which ensures messages "propagate" through the graph. In contrast, the standard message passing has a unique fixed point corresponding to the all ones vector which inherently causes oversmoothing in the features. Here, $c$ is chosen to ensure the operator norm of the matrix ${\bm{I}}+{\bm{A}}$ is bounded by one.
Figure 2: (a) Example datapoint on $n=25$ nodes; the target is $y=5$ (distance between red nodes). (b) Results for the ring toy model problem with $100$ nodes where the unitary GCN with UniConv or Lie UniConv layers is the only message passing architecture able to learn successfully. Best performance over networks with $5$, $10$, and $20$ layers is plotted. Other architectures typically perform best with $5$ layers and only learn shorter distances (see \ref{['app:additional_graph_distance']}).
Figure 3: Additional results on the ring plot toy model including additional architectures. We show here the performance of various models with $5$, $10$, or $20$ layers. The unitary GCN is the only message passing architecture that achieves stable performance with added layers and can learn the task. Apart from message passing architectures, global transformer architectures like GPS can learn the task when given Laplacian positional encoding. The trivial performance corresponding to outputting the average output is shown as a dotted horizontal line.
Figure 4: Test accuracies on Mutag for GCN, GIN, GAT, and a GCN with UniConv layers with increasing number of layers. Except for the unitary network, all other message passing architectures collapse to trivial accuracy levels as the number of layers increases.
Figure 5: Training and test MAE of the distance learning task for the dihedral group. Listed as column headers are the number of convolution layers in each network. Residual and Unitary convolutional networks are both able to learn the task under default hyperparameters for the optimizer.
...and 2 more figures

Theorems & Definitions (22)

Definition 1: Separable unitary graph convolution (UniConv)
Remark 2
Definition 3: Lie orthogonal/unitary graph convolution (Lie UniConv)
Example 1: Convolution on regular representation (Lie algebra)
Proposition 3
Definition 4: Rayleigh quotient chung1997spectral
Proposition 4: Invariance of Rayleigh quotient
Proposition 4
Definition 5: Dynamical isometry
Example 2
...and 12 more

Unitary convolutions for learning on graphs and groups

TL;DR

Abstract

Unitary convolutions for learning on graphs and groups

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (22)