Approximating Matrix Functions with Deep Neural Networks and Transformers

Rahul Padmanabhan; Simone Brugiapaglia

Approximating Matrix Functions with Deep Neural Networks and Transformers

Rahul Padmanabhan, Simone Brugiapaglia

TL;DR

The paper tackles the problem of learning matrix functions, such as $e^A$ and $\operatorname{sign}(A)$, with neural networks. It combines a theoretical result showing a ReLU DNN can approximate $e^A$ over $[-M,M]^{n\times n}$ with width exponential in $nM$ and depth roughly linear in $nM$, with a practical study demonstrating that a transformer encoder–decoder using numerical encodings can achieve high accuracy for certain matrix functions on small matrices ($3\times3$ to $5\times5$). The numerical results reveal a strong dependence on encoding schemes, with the FP15 encoding excelling for the sign function and B1999 performing best for the exponential, while sine and cosine remain challenging. Overall, the work highlights the potential and limitations of Transformer-based surrogates for matrix-function computations in scientific computing and points to encoding design as a crucial lever for performance.

Abstract

Transformers have revolutionized natural language processing, but their use for numerical computation has received less attention. We study the approximation of matrix functions, which map scalar functions to matrices, using neural networks including transformers. We focus on functions mapping square matrices to square matrices of the same dimension. These types of matrix functions appear throughout scientific computing, e.g., the matrix exponential in continuous-time Markov chains and the matrix sign function in stability analysis of dynamical systems. In this paper, we make two contributions. First, we prove bounds on the width and depth of ReLU networks needed to approximate the matrix exponential to an arbitrary precision. Second, we show experimentally that a transformer encoder-decoder with suitable numerical encodings can approximate certain matrix functions at a relative error of 5% with high probability. Our study reveals that the encoding scheme strongly affects performance, with different schemes working better for different functions.

Approximating Matrix Functions with Deep Neural Networks and Transformers

TL;DR

The paper tackles the problem of learning matrix functions, such as

and

, with neural networks. It combines a theoretical result showing a ReLU DNN can approximate

over

with width exponential in

and depth roughly linear in

, with a practical study demonstrating that a transformer encoder–decoder using numerical encodings can achieve high accuracy for certain matrix functions on small matrices (

). The numerical results reveal a strong dependence on encoding schemes, with the FP15 encoding excelling for the sign function and B1999 performing best for the exponential, while sine and cosine remain challenging. Overall, the work highlights the potential and limitations of Transformer-based surrogates for matrix-function computations in scientific computing and points to encoding design as a crucial lever for performance.

Abstract

Paper Structure (14 sections, 3 theorems, 6 equations, 1 figure, 3 tables)

This paper contains 14 sections, 3 theorems, 6 equations, 1 figure, 3 tables.

Introduction
Preliminaries
Theoretical Results: DNN Bounds for Matrix Exponential
Numerical Experiments
Experimental Setup
Baseline Methods.
Baseline Method Loss Functions.
Transformer Encoder-Decoder.
Baseline Results
Transformer Encoder-Decoder Results
Conclusions
Future Work.
Acknowledgments.
Disclosure of Interests.

Key Result

lemma thmcounterlemma

Let $0 < \delta < 1$, $l \in \mathbb{N}$, and $M = \prod_{i=1}^{l} M_i \geq 1$. There exists a ReLU DNN $\chi^{(l)}_{\delta}$ with width $\leq c_1 \cdot l$, and depth $\leq c_2(1 + \log(l)[\log(l\delta^{-1}) + \log(M)])$, where $c_1, c_2 > 0$ are universal constants.

Figures (1)

Figure 1: Maximum tolerance-based accuracy decreases as matrix dimension increases. From dimension 3 onwards, accuracy is less than 3% for all functions. Experiments were run up to dimension 8.

Theorems & Definitions (6)

definition thmcounterdefinition: Matrix Function via Jordan Canonical Form
lemma thmcounterlemma: Approximate Multiplication by ReLU DNNs adcock2025near
lemma thmcounterlemma: Matrix Power Representation
theorem thmcountertheorem: DNN Architecture for Matrix Exponential
proof : Sketch
definition thmcounterdefinition: Tolerance-Based Accuracy Metric

Approximating Matrix Functions with Deep Neural Networks and Transformers

TL;DR

Abstract

Approximating Matrix Functions with Deep Neural Networks and Transformers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (6)