Approximation of Permutation Invariant Polynomials by Transformers: Efficient Construction in Column-Size

Naoki Takeshita; Masaaki Imaizumi

Approximation of Permutation Invariant Polynomials by Transformers: Efficient Construction in Column-Size

Naoki Takeshita, Masaaki Imaizumi

TL;DR

This work shows that column-symmetric polynomials on matrices can be universally approximated by Transformers with a single attention head, with a width that scales as $12\cdot(2d)^sN$ and depth $2sL+3s$, and an error that decays as $8^s\cdot N^{-L}$. The approach constructs monomial column-symmetric polynomials via rank-based decomposition and inductively builds higher-rank terms using a combination of feed-forward and attention layers, achieving parameter efficiency by keeping the number of parameters independent of the input column count $n$. The main contributions are a constructive proof, explicit architecture parameters, and detailed error analyses that establish how depth, width, and rank influence approximation quality for column-symmetric polynomials on matrix inputs. The results highlight the potential of deep Transformers for symmetry-aware function approximation with favorable parameter efficiency, and they discuss practical considerations, such as the impact of $d$, $s$, and positional encoding on scaling and applicability.

Abstract

Transformers are a type of neural network that have demonstrated remarkable performance across various domains, particularly in natural language processing tasks. Motivated by this success, research on the theoretical understanding of transformers has garnered significant attention. A notable example is the mathematical analysis of their approximation power, which validates the empirical expressive capability of transformers. In this study, we investigate the ability of transformers to approximate column-symmetric polynomials, an extension of symmetric polynomials that take matrices as input. Consequently, we establish an explicit relationship between the size of the transformer network and its approximation capability, leveraging the parameter efficiency of transformers and their compatibility with symmetry by focusing on the algebraic properties of symmetric polynomials.

Approximation of Permutation Invariant Polynomials by Transformers: Efficient Construction in Column-Size

TL;DR

This work shows that column-symmetric polynomials on matrices can be universally approximated by Transformers with a single attention head, with a width that scales as

and depth

, and an error that decays as

. The approach constructs monomial column-symmetric polynomials via rank-based decomposition and inductively builds higher-rank terms using a combination of feed-forward and attention layers, achieving parameter efficiency by keeping the number of parameters independent of the input column count

. The main contributions are a constructive proof, explicit architecture parameters, and detailed error analyses that establish how depth, width, and rank influence approximation quality for column-symmetric polynomials on matrix inputs. The results highlight the potential of deep Transformers for symmetry-aware function approximation with favorable parameter efficiency, and they discuss practical considerations, such as the impact of

, and positional encoding on scaling and applicability.

Approximation of Permutation Invariant Polynomials by Transformers: Efficient Construction in Column-Size

TL;DR

Abstract

Approximation of Permutation Invariant Polynomials by Transformers: Efficient Construction in Column-Size

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (26)