Unlocking In-Context Learning for Natural Datasets Beyond Language Modelling
Jelena Bratulić, Sudhanshu Mittal, David T. Hoffmann, Samuel Böhm, Robin Tibor Schirrmeister, Tonio Ball, Christian Rupprecht, Thomas Brox
TL;DR
The paper tackles how In-Context Learning (ICL) can emerge beyond language in transformer models by examining the learning dynamics of induction heads. It shows that enforcing exact token copies in training sequences (instCopy) simplifies the look-up function and promotes the formation of the previous-token head, enabling ICL across visual datasets and EEG—conditions under which ICL was previously unstable or absent. The study also reveals that making the In-Weight Learning (IWL) task sufficiently challenging (via more classes or label noise, or instance discrimination) promotes ICL, highlighting a crucial ICL/IWL interplay. Collectively, these findings broaden ICL applicability to noisy real-world modalities, enabling rapid adaptation to new visual and EEG tasks without weight updates, with implications for cross-domain generalization and real-time brain-computer interfaces.
Abstract
Large Language Models (LLMs) exhibit In-Context Learning (ICL), which enables the model to perform new tasks conditioning only on the examples provided in the context without updating the model's weights. While ICL offers fast adaptation across natural language tasks and domains, its emergence is less straightforward for modalities beyond text. In this work, we systematically uncover properties present in LLMs that support the emergence of ICL for autoregressive models and various modalities by promoting the learning of the needed mechanisms for ICL. We identify exact token repetitions in the training data sequences as an important factor for ICL. Such repetitions further improve stability and reduce transiency in ICL performance. Moreover, we emphasise the significance of training task difficulty for the emergence of ICL. Finally, by applying our novel insights on ICL emergence, we unlock ICL capabilities for various visual datasets and a more challenging EEG classification task.
