Learning Linear Regression with Low-Rank Tasks in-Context

Kaito Takanami; Takashi Takahashi; Yoshiyuki Kabashima

Learning Linear Regression with Low-Rank Tasks in-Context

Kaito Takanami, Takashi Takahashi, Yoshiyuki Kabashima

TL;DR

The paper tackles the theoretical understanding of in-context learning (ICL) by analyzing a linear-attention transformer trained on low-rank regression tasks in the high-dimensional limit. It unveils a clean decomposition of ICL predictions into an algorithmic signal and two suppressible noise components, and shows that finite pre-training data induce an implicit regularization that stabilizes learning in low-rank settings. A phase transition governed by the relationship between task dimensionality and task diversity emerges, with clear implications for pre-training strategy and curriculum design. The results provide a principled framework for interpreting how transformers learn task structure and adapt to out-of-distribution scenarios, with concrete predictions for generalization under TM, IDG, and ODG protocols.

Abstract

In-context learning (ICL) is a key building block of modern large language models, yet its theoretical mechanisms remain poorly understood. It is particularly mysterious how ICL operates in real-world applications where tasks have a common structure. In this work, we address this problem by analyzing a linear attention model trained on low-rank regression tasks. Within this setting, we precisely characterize the distribution of predictions and the generalization error in the high-dimensional limit. Moreover, we find that statistical fluctuations in finite pre-training data induce an implicit regularization. Finally, we identify a sharp phase transition of the generalization error governed by task structure. These results provide a framework for understanding how transformers learn to learn the task structure.

Learning Linear Regression with Low-Rank Tasks in-Context

TL;DR

Abstract

Learning Linear Regression with Low-Rank Tasks in-Context

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (12)