Table of Contents
Fetching ...

Equivariant Matrix Function Neural Networks

Ilyes Batatia, Lars L. Schaaf, Huajie Chen, Gábor Csányi, Christoph Ortner, Felix A. Faber

TL;DR

The paper tackles the challenge of modeling non-local, many-body interactions in graphs, which standard MPNNs struggle with due to limited receptive fields and oversmoothing. It introduces Matrix Function Networks (MFNs), an equivariant GNN framework that parameterizes non-local interactions through analytic matrix functions defined on graph operators, with a resolvent-based parameterization enabling scalable evaluation. The architecture combines a local equivariant graph layer, construction of self-adjoint, group-equivariant matrices, and a matrix-function update, yielding potential linear scaling under sparse structure via selected inversion. Empirically, MFNs achieve state-of-the-art performance on ZINC and TU datasets and demonstrate strong capability to capture complex non-local quantum interactions (e.g., cumulenes), underscoring MFN’s potential to advance molecular modeling and force-field development.

Abstract

Graph Neural Networks (GNNs), especially message-passing neural networks (MPNNs), have emerged as powerful architectures for learning on graphs in diverse applications. However, MPNNs face challenges when modeling non-local interactions in graphs such as large conjugated molecules, and social networks due to oversmoothing and oversquashing. Although Spectral GNNs and traditional neural networks such as recurrent neural networks and transformers mitigate these challenges, they often lack generalizability, or fail to capture detailed structural relationships or symmetries in the data. To address these concerns, we introduce Matrix Function Neural Networks (MFNs), a novel architecture that parameterizes non-local interactions through analytic matrix equivariant functions. Employing resolvent expansions offers a straightforward implementation and the potential for linear scaling with system size. The MFN architecture achieves stateof-the-art performance in standard graph benchmarks, such as the ZINC and TU datasets, and is able to capture intricate non-local interactions in quantum systems, paving the way to new state-of-the-art force fields.

Equivariant Matrix Function Neural Networks

TL;DR

The paper tackles the challenge of modeling non-local, many-body interactions in graphs, which standard MPNNs struggle with due to limited receptive fields and oversmoothing. It introduces Matrix Function Networks (MFNs), an equivariant GNN framework that parameterizes non-local interactions through analytic matrix functions defined on graph operators, with a resolvent-based parameterization enabling scalable evaluation. The architecture combines a local equivariant graph layer, construction of self-adjoint, group-equivariant matrices, and a matrix-function update, yielding potential linear scaling under sparse structure via selected inversion. Empirically, MFNs achieve state-of-the-art performance on ZINC and TU datasets and demonstrate strong capability to capture complex non-local quantum interactions (e.g., cumulenes), underscoring MFN’s potential to advance molecular modeling and force-field development.

Abstract

Graph Neural Networks (GNNs), especially message-passing neural networks (MPNNs), have emerged as powerful architectures for learning on graphs in diverse applications. However, MPNNs face challenges when modeling non-local interactions in graphs such as large conjugated molecules, and social networks due to oversmoothing and oversquashing. Although Spectral GNNs and traditional neural networks such as recurrent neural networks and transformers mitigate these challenges, they often lack generalizability, or fail to capture detailed structural relationships or symmetries in the data. To address these concerns, we introduce Matrix Function Neural Networks (MFNs), a novel architecture that parameterizes non-local interactions through analytic matrix equivariant functions. Employing resolvent expansions offers a straightforward implementation and the potential for linear scaling with system size. The MFN architecture achieves stateof-the-art performance in standard graph benchmarks, such as the ZINC and TU datasets, and is able to capture intricate non-local interactions in quantum systems, paving the way to new state-of-the-art force fields.
Paper Structure (53 sections, 28 equations, 5 figures, 6 tables)

This paper contains 53 sections, 28 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Matrix function network architecture. Illustrating matrix construction and non-locality of matrix functions on a molecular graph.
  • Figure 2: Block structure of a Euclidean MFN operator, ${\bf H}$. Each entry in ${\bf H}$ corresponds to a different product of representations of the group of rotations $(ss, sp, pp,...)$. Example for L=1.
  • Figure 3: Visualizing MFN expressivity on cumulene chains. The left panel depicts energy trends with respect to cumulene chain length at a fixed angle $\phi=5\degree$. The right panel shows the DFT (ground truth) and the predicted energy as a function of the dihedral angle $\phi$ between the hydrogen atoms for a cumulene chain containing 12 carbon atoms. Local many-body equivariant models (MACE) are only able to capture average trends, even though test configurations are included in the training set. Invariant MFNs ($L=0$) capture only the trends with respect to length, while equivariant MFNs ($L=1$) capture both non-local trends. All models have a cutoff distance $r_c$ of 3Å, corresponding to the nearest neighbors, with two message-passing layers. The cutoff distance as well as MACE's receptive field for the first carbon atom is annotated in the left panel.
  • Figure 4: Block structure of a Euclidean Operator, H, learnt in the MFNs. Each entry in H corresponds to a different product of representations of the group of rotations $(ss, sp, pp,...)$.
  • Figure 5: Angles and distances that define a cumulene graph used to test expressivity in Figure \ref{['fig:cumulenes-combined']}. The carbon-hydrogen ($r_{CH}$), first carbon-carbon ($r_{CC}^{(1)}$), and remaining carbon-carbon distances ($r_{CC}^{(>1)}$) are set to 1.086 Å, 1.315 Å and 1.279 Å respectively. The angle between the hydrogen-carbon-hydrogen ($\theta_{HCH}$) is fixed at 118.71 degrees and the dihedral angle $\phi$ depends on the experiment as detailed in the main text.