Table of Contents
Fetching ...

Algebraformer: A Neural Approach to Linear Systems

Pietro Sittoni, Francesco Tudisco

TL;DR

Algebraformer targets the challenge of ill-conditioned linear systems by introducing a Transformer-based end-to-end solver that encodes the system as column-wise patches, achieving $O(n^2)$ memory. The model, with 12 blocks and 9.5M parameters, is pretrained on diffusion PDEs and fine-tuned for spectral BVP interpolation and nonlinear optimization tasks, including Newton updates. It demonstrates robustness to ill-conditioning and noise, outperforms LSTM/GRU baselines, and provides speedups over classical solvers, highlighting the practicality of general-purpose neural architectures in scientific computing pipelines. The approach offers a scalable, plug-and-play alternative to handcrafted solvers, with demonstrated transferability to unseen equations and tasks in physics-informed contexts.

Abstract

Recent work in deep learning has opened new possibilities for solving classical algorithmic tasks using end-to-end learned models. In this work, we investigate the fundamental task of solving linear systems, particularly those that are ill-conditioned. Existing numerical methods for ill-conditioned systems often require careful parameter tuning, preconditioning, or domain-specific expertise to ensure accuracy and stability. In this work, we propose Algebraformer, a Transformer-based architecture that learns to solve linear systems end-to-end, even in the presence of severe ill-conditioning. Our model leverages a novel encoding scheme that enables efficient representation of matrix and vector inputs, with a memory complexity of $O(n^2)$, supporting scalable inference. We demonstrate its effectiveness on application-driven linear problems, including interpolation tasks from spectral methods for boundary value problems and acceleration of the Newton method. Algebraformer achieves competitive accuracy with significantly lower computational overhead at test time, demonstrating that general-purpose neural architectures can effectively reduce complexity in traditional scientific computing pipelines.

Algebraformer: A Neural Approach to Linear Systems

TL;DR

Algebraformer targets the challenge of ill-conditioned linear systems by introducing a Transformer-based end-to-end solver that encodes the system as column-wise patches, achieving memory. The model, with 12 blocks and 9.5M parameters, is pretrained on diffusion PDEs and fine-tuned for spectral BVP interpolation and nonlinear optimization tasks, including Newton updates. It demonstrates robustness to ill-conditioning and noise, outperforms LSTM/GRU baselines, and provides speedups over classical solvers, highlighting the practicality of general-purpose neural architectures in scientific computing pipelines. The approach offers a scalable, plug-and-play alternative to handcrafted solvers, with demonstrated transferability to unseen equations and tasks in physics-informed contexts.

Abstract

Recent work in deep learning has opened new possibilities for solving classical algorithmic tasks using end-to-end learned models. In this work, we investigate the fundamental task of solving linear systems, particularly those that are ill-conditioned. Existing numerical methods for ill-conditioned systems often require careful parameter tuning, preconditioning, or domain-specific expertise to ensure accuracy and stability. In this work, we propose Algebraformer, a Transformer-based architecture that learns to solve linear systems end-to-end, even in the presence of severe ill-conditioning. Our model leverages a novel encoding scheme that enables efficient representation of matrix and vector inputs, with a memory complexity of , supporting scalable inference. We demonstrate its effectiveness on application-driven linear problems, including interpolation tasks from spectral methods for boundary value problems and acceleration of the Newton method. Algebraformer achieves competitive accuracy with significantly lower computational overhead at test time, demonstrating that general-purpose neural architectures can effectively reduce complexity in traditional scientific computing pipelines.

Paper Structure

This paper contains 19 sections, 22 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Model overview. We split the matrix $A$ into column patches, to each patch we attach one component of the vector $b$, then we embed each patch into a decoder-only Transformer backbone. At the end, we decode the output of the backbone to output the vector solution $x$
  • Figure 2: The two plots on the left show the relative MSE on the test set during training for \ref{['eq:darcy_absorption']} and \ref{['eq:darcy_reaction']}. The plot on the right displays the relative MSE on the test set with noisy data for \ref{['eq:darcy']}, where we compare Algebraformer with $3$ different numerical methods.
  • Figure 3: Time to convergence for Newton and accelerated Newton methods.
  • Figure 4: Three different samples of the function $K$
  • Figure 5: Three different samples of the function $f$
  • ...and 1 more figures