Message-Passing GNNs Fail to Approximate Sparse Triangular Factorizations
Vladislav Trifonov, Ekaterina Muravleva, Ivan Oseledets
TL;DR
This work addresses the challenge of learning sparse triangular preconditioners for SPD matrices with graph neural networks. It shows that the inherent locality of message-passing architectures prevents capturing the non-local dependencies required to approximate $A \approx L L^{\top}$, even when using non-local Graph Transformers. The authors introduce a constructive, inverse-based approach via K-optimal preconditioners by minimizing $K\left(L^{\top} A^{-1} L\right)$ and derive an explicit solution, enabling principled construction of sparse $L$ and a benchmark combining synthetic non-local cases with SuiteSparse matrices. Empirically, standard MP-GNNs (including attention-based variants) fail to outperform baselines or exact factors, highlighting a barrier to learning non-local linear-algebra transformations with vanilla GNNs. The paper provides a practical benchmark and argues for architectural innovations beyond traditional message-passing to advance ML-enabled scientific computing, particularly for matrix factorization tasks.
Abstract
Graph Neural Networks (GNNs) have been proposed as a tool for learning sparse matrix preconditioners, which are key components in accelerating linear solvers. This position paper argues that message-passing GNNs are fundamentally incapable of approximating sparse triangular factorizations. We demonstrate that message-passing GNNs fundamentally fail to approximate sparse triangular factorizations for classes of matrices for which high-quality preconditioners exist but require non-local dependencies. To illustrate this, we construct a set of baselines using both synthetic matrices and real-world examples from the SuiteSparse collection. Across a range of GNN architectures, including Graph Attention Networks and Graph Transformers, we observe severe performance degradation compared to exact or K-optimal factorizations, with cosine similarity dropping below $0.6$ in key cases. Our theoretical and empirical results suggest that architectural innovations beyond message-passing are necessary for applying GNNs to scientific computing tasks such as matrix factorization. Experiments demonstrate that overcoming non-locality alone is insufficient. Tailored architectures are necessary to capture the required dependencies since even a completely non-local Graph Transformer fails to match the proposed baselines.
