In-Context Function Learning in Large Language Models
Elif Akata, Konstantinos Voudouris, Vincent Fortuin, Eric Schulz
TL;DR
The paper investigates how large language models perform in-context learning for continuous function tasks by casting ICL as Gaussian Process regression with known priors. It proposes a principled evaluation framework using GP regression as a lower bound and 1-NN as an upper bound, analyzes inductive biases via likelihood comparisons across kernels, and tests parameter-efficient post-training (SFT and GRPO) to steer these priors. Key findings show GP-like learning curves that improve with model size, kernel smoothness shaping learning speed, and a bias toward rough functions in low dimensions that shifts toward smoother functions in higher dimensions; post-training can steer these priors toward the structure of the training data, with GRPO offering more generalization. The work provides a quantitative framework for understanding and steering in-context function learning in LLMs, with implications for data-efficient continuous-function tasks and model alignment.
Abstract
Large language models (LLMs) can learn from a few demonstrations provided at inference time. We study this in-context learning phenomenon through the lens of Gaussian Processes (GPs). We build controlled experiments where models observe sequences of multivariate scalar-valued function samples drawn from known GP priors. We evaluate prediction error in relation to the number of demonstrations and compare against two principled references: (i) an empirical GP-regression learner that gives a lower bound on achievable error, and (ii) the expected error of a 1-nearest-neighbor (1-NN) rule, which gives a data-driven upper bound. Across model sizes, we find that LLM learning curves are strongly influenced by the function-generating kernels and approach the GP lower bound as the number of demonstrations increases. We then study the inductive biases of these models using a likelihood-based analysis. We find that LLM predictions are most likely under less smooth GP kernels. Finally, we explore whether post-training can shift these inductive biases and improve sample-efficiency on functions sampled from GPs with smoother kernels. We find that both reinforcement learning and supervised fine-tuning can effectively shift inductive biases in the direction of the training data. Together, our framework quantifies the extent to which LLMs behave like GP learners and provides tools for steering their inductive biases for continuous function learning tasks.
