Bayes optimal learning of attention-indexed models

Fabrizio Boncoraglio; Emanuele Troiani; Vittorio Erba; Lenka Zdeborová

Bayes optimal learning of attention-indexed models

Fabrizio Boncoraglio, Emanuele Troiani, Vittorio Erba, Lenka Zdeborová

TL;DR

The attention-indexed model (AIM), a theoretical framework for analyzing learning in deep attention layers, is introduced and a matching approximate message passing algorithm is proposed and it is shown that gradient descent can reach optimal performance.

Abstract

We introduce the attention-indexed model (AIM), a theoretical framework for analyzing learning in deep attention layers. Inspired by multi-index models, AIM captures how token-level outputs emerge from layered bilinear interactions over high-dimensional embeddings. Unlike prior tractable attention models, AIM allows full-width key and query matrices, aligning more closely with practical transformers. Using tools from statistical mechanics and random matrix theory, we derive closed-form predictions for Bayes-optimal generalization error and identify sharp phase transitions as a function of sample complexity, model width, and sequence length. We propose a matching approximate message passing algorithm and show that gradient descent can reach optimal performance. AIM offers a solvable playground for understanding learning in self-attention layers, that are key components of modern architectures.

Bayes optimal learning of attention-indexed models

TL;DR

Abstract

Bayes optimal learning of attention-indexed models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (1)