Algebraformer: A Neural Approach to Linear Systems
Pietro Sittoni, Francesco Tudisco
TL;DR
Algebraformer targets the challenge of ill-conditioned linear systems by introducing a Transformer-based end-to-end solver that encodes the system as column-wise patches, achieving $O(n^2)$ memory. The model, with 12 blocks and 9.5M parameters, is pretrained on diffusion PDEs and fine-tuned for spectral BVP interpolation and nonlinear optimization tasks, including Newton updates. It demonstrates robustness to ill-conditioning and noise, outperforms LSTM/GRU baselines, and provides speedups over classical solvers, highlighting the practicality of general-purpose neural architectures in scientific computing pipelines. The approach offers a scalable, plug-and-play alternative to handcrafted solvers, with demonstrated transferability to unseen equations and tasks in physics-informed contexts.
Abstract
Recent work in deep learning has opened new possibilities for solving classical algorithmic tasks using end-to-end learned models. In this work, we investigate the fundamental task of solving linear systems, particularly those that are ill-conditioned. Existing numerical methods for ill-conditioned systems often require careful parameter tuning, preconditioning, or domain-specific expertise to ensure accuracy and stability. In this work, we propose Algebraformer, a Transformer-based architecture that learns to solve linear systems end-to-end, even in the presence of severe ill-conditioning. Our model leverages a novel encoding scheme that enables efficient representation of matrix and vector inputs, with a memory complexity of $O(n^2)$, supporting scalable inference. We demonstrate its effectiveness on application-driven linear problems, including interpolation tasks from spectral methods for boundary value problems and acceleration of the Newton method. Algebraformer achieves competitive accuracy with significantly lower computational overhead at test time, demonstrating that general-purpose neural architectures can effectively reduce complexity in traditional scientific computing pipelines.
