Neural Network Quantum Field Theory from Transformer Architectures
Dmitry S. Ageev, Yulia A. Ageeva
TL;DR
The paper proposes a neural-network quantum-field-theory (NN-QFT) construction of Euclidean scalar fields using transformer attention heads, defining $n$-point correlators by averaging over random parameter ensembles. A single head with shared softmax weights yields non-Gaussian statistics that persist in the infinite-width limit $d_k\to\infty$, with a finite independence-breaking contribution to the connected four-point function arising as a covariance over the query--key weights; Euclidean-invariant kernels can be engineered via random-feature token embeddings. By aggregating $N_h$ independent heads with the standard $1/N_h$ variance normalization, connected non-Gaussian correlators are suppressed as $\mathcal{O}(1/N_h)$ and vanish as $N_h\to\infty$, yielding a Gaussian NN-QFT in the large-head limit. This work links transformer architectures to QFT-like behavior, showing how non-Gaussianity can appear at the single-head level and be washed out by multi-head averaging, with potential for constructing interacting actions within the NN-QFT framework.
Abstract
We propose a neural-network construction of Euclidean scalar quantum field theories from transformer attention heads, defining $n$-point correlators by averaging over random network parameters in the NN-QFT framework. For a single attention head, shared random softmax weights couple different width coordinates and induce non-Gaussian field statistics that persist in the infinite-width limit $d_k\to\infty$. We compute the two-point function in an attention-weight representation and show how Euclidean-invariant kernels can be engineered via random-feature token embeddings. We then analyze the connected four-point function and identify an "independence-breaking" contribution, expressible as a covariance over query-key weights, which remains finite at infinite width. Finally, we show that summing many independent heads with standard $1/N_h$ normalization suppresses connected non-Gaussian correlators as $1/N_h$, yielding a Gaussian NN-QFT in the large-head limit.
