Table of Contents
Fetching ...

Message-Passing GNNs Fail to Approximate Sparse Triangular Factorizations

Vladislav Trifonov, Ekaterina Muravleva, Ivan Oseledets

TL;DR

This work addresses the challenge of learning sparse triangular preconditioners for SPD matrices with graph neural networks. It shows that the inherent locality of message-passing architectures prevents capturing the non-local dependencies required to approximate $A \approx L L^{\top}$, even when using non-local Graph Transformers. The authors introduce a constructive, inverse-based approach via K-optimal preconditioners by minimizing $K\left(L^{\top} A^{-1} L\right)$ and derive an explicit solution, enabling principled construction of sparse $L$ and a benchmark combining synthetic non-local cases with SuiteSparse matrices. Empirically, standard MP-GNNs (including attention-based variants) fail to outperform baselines or exact factors, highlighting a barrier to learning non-local linear-algebra transformations with vanilla GNNs. The paper provides a practical benchmark and argues for architectural innovations beyond traditional message-passing to advance ML-enabled scientific computing, particularly for matrix factorization tasks.

Abstract

Graph Neural Networks (GNNs) have been proposed as a tool for learning sparse matrix preconditioners, which are key components in accelerating linear solvers. This position paper argues that message-passing GNNs are fundamentally incapable of approximating sparse triangular factorizations. We demonstrate that message-passing GNNs fundamentally fail to approximate sparse triangular factorizations for classes of matrices for which high-quality preconditioners exist but require non-local dependencies. To illustrate this, we construct a set of baselines using both synthetic matrices and real-world examples from the SuiteSparse collection. Across a range of GNN architectures, including Graph Attention Networks and Graph Transformers, we observe severe performance degradation compared to exact or K-optimal factorizations, with cosine similarity dropping below $0.6$ in key cases. Our theoretical and empirical results suggest that architectural innovations beyond message-passing are necessary for applying GNNs to scientific computing tasks such as matrix factorization. Experiments demonstrate that overcoming non-locality alone is insufficient. Tailored architectures are necessary to capture the required dependencies since even a completely non-local Graph Transformer fails to match the proposed baselines.

Message-Passing GNNs Fail to Approximate Sparse Triangular Factorizations

TL;DR

This work addresses the challenge of learning sparse triangular preconditioners for SPD matrices with graph neural networks. It shows that the inherent locality of message-passing architectures prevents capturing the non-local dependencies required to approximate , even when using non-local Graph Transformers. The authors introduce a constructive, inverse-based approach via K-optimal preconditioners by minimizing and derive an explicit solution, enabling principled construction of sparse and a benchmark combining synthetic non-local cases with SuiteSparse matrices. Empirically, standard MP-GNNs (including attention-based variants) fail to outperform baselines or exact factors, highlighting a barrier to learning non-local linear-algebra transformations with vanilla GNNs. The paper provides a practical benchmark and argues for architectural innovations beyond traditional message-passing to advance ML-enabled scientific computing, particularly for matrix factorization tasks.

Abstract

Graph Neural Networks (GNNs) have been proposed as a tool for learning sparse matrix preconditioners, which are key components in accelerating linear solvers. This position paper argues that message-passing GNNs are fundamentally incapable of approximating sparse triangular factorizations. We demonstrate that message-passing GNNs fundamentally fail to approximate sparse triangular factorizations for classes of matrices for which high-quality preconditioners exist but require non-local dependencies. To illustrate this, we construct a set of baselines using both synthetic matrices and real-world examples from the SuiteSparse collection. Across a range of GNN architectures, including Graph Attention Networks and Graph Transformers, we observe severe performance degradation compared to exact or K-optimal factorizations, with cosine similarity dropping below in key cases. Our theoretical and empirical results suggest that architectural innovations beyond message-passing are necessary for applying GNNs to scientific computing tasks such as matrix factorization. Experiments demonstrate that overcoming non-locality alone is insufficient. Tailored architectures are necessary to capture the required dependencies since even a completely non-local Graph Transformer fails to match the proposed baselines.

Paper Structure

This paper contains 23 sections, 1 theorem, 24 equations, 4 figures, 1 algorithm.

Key Result

Theorem 2.1

Let $A$ be a tridiagonal symmetric positive definite $n \times n$ matrix. Then it can be factorized as where $L$ is a bidiagonal lower triangular matrix, and then mapping $A \rightarrow L$ is not local, which means that there exist matrix $A$ and $A'$ such that $A-A'$ has only one non-zero element, where the difference $L - L'$ has dense support: many entries change significantly, even though onl

Figures (4)

  • Figure 1: Difference of the diagonal elements between the Cholesky factor $L$ and perturbed factor $L'$ in a single entry $A_{11}$ of the tridiagonal matrix. (Left) 1D Laplacian matrix. (Right) Counterexample.
  • Figure 2: Experiments on the synthetic dataset. Cosine similarity between true $L$ and predicted $L(\theta)$ factors of preconditioners during training. Higher is better.
  • Figure 3: Experiments on the K-optimal preconditioners for the SuiteSparse subset. Cosine similarity between true $L$ and predicted $L(\theta)$ factors of preconditioners during training. Higher is better.
  • Figure 4: The performance of K-optimal preconditioner and IC(0) preconditioner during solution of SuiteSparse subset. K-optimal only: IC(0) preconditioner failed. K-optimal less iterations: both preconditioners were successful, with superior K-optimal performance. IC(0) less iterations: both preconditioners were successful, with superior IC(0) performance. Both failed: both preconditioners failed.

Theorems & Definitions (2)

  • Theorem 2.1
  • proof