Approximation of relation functions and attention mechanisms
Awni Altabaa, John Lafferty
TL;DR
The paper proves that relation functions realized as inner products of neural encodings can be universally approximated in both symmetric and asymmetric settings, tying symmetric cases to RKHS kernels and asymmetric cases to RKBS kernels. It provides explicit neuron-count bounds linked to kernel spectrum decay and Barron-norm smoothness, and shows how attention mechanisms in Transformers can approximate retrieval or selection tasks via these inner-product relations, leveraging Debreu's representation theorem. The results offer a rigorous foundation for relational learning with neural nets, clarify the theoretical role of attention, and connect neural approximations to classical kernel theory and economic representations. Overall, the work bridges neural-network approximation theory, kernel methods, and attention mechanisms, offering principled bounds and mechanisms for efficient relational computation.
Abstract
Inner products of neural network feature maps arise in a wide variety of machine learning frameworks as a method of modeling relations between inputs. This work studies the approximation properties of inner products of neural networks. It is shown that the inner product of a multi-layer perceptron with itself is a universal approximator for symmetric positive-definite relation functions. In the case of asymmetric relation functions, it is shown that the inner product of two different multi-layer perceptrons is a universal approximator. In both cases, a bound is obtained on the number of neurons required to achieve a given accuracy of approximation. In the symmetric case, the function class can be identified with kernels of reproducing kernel Hilbert spaces, whereas in the asymmetric case the function class can be identified with kernels of reproducing kernel Banach spaces. Finally, these approximation results are applied to analyzing the attention mechanism underlying Transformers, showing that any retrieval mechanism defined by an abstract preorder can be approximated by attention through its inner product relations. This result uses the Debreu representation theorem in economics to represent preference relations in terms of utility functions.
