In-Context Learning of Linear Systems: Generalization Theory and Applications to Operator Learning
Frank Cole, Yulong Lu, Wuzhe Xu, Tianhao Zhang
TL;DR
The paper provides a theoretical framework for in-context learning of linear systems using a linear transformer, deriving an in-domain generalization bound that decays with prompt length and pre-training size. It introduces a novel task-diversity concept to characterize when pre-trained transformers generalize under task distribution shifts, and proves that diversity is a sufficient (and in some cases necessary) condition for robust OOD performance. The results extend to in-context operator learning and PDEs, offering an abstract operator-learning bound that translates into PDE Corollaries for elliptic problems. Numerical experiments on random matrices and linear elliptic PDEs validate the theoretical rates and demonstrate the practical impact of task diversity on out-of-domain generalization.
Abstract
We study theoretical guarantees for solving linear systems in-context using a linear transformer architecture. For in-domain generalization, we provide neural scaling laws that bound the generalization error in terms of the number of tasks and sizes of samples used in training and inference. For out-of-domain generalization, we find that the behavior of trained transformers under task distribution shifts depends crucially on the distribution of the tasks seen during training. We introduce a novel notion of task diversity and show that it defines a necessary and sufficient condition for pre-trained transformers generalize under task distribution shifts. We also explore applications of learning linear systems in-context, such as to in-context operator learning for PDEs. Finally, we provide some numerical experiments to validate the established theory.
