Is In-Context Learning Sufficient for Instruction Following in LLMs?
Hao Zhao, Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion
TL;DR
The paper rigorously evaluates whether in-context learning alone can achieve instruction following in large language models, focusing on URIAL prompts and comparing them to instruction fine-tuning on MT-Bench. It demonstrates that decoding configurations and the quality of demonstrations are critical for ICL effectiveness, and shows that high-quality, carefully selected in-context demonstrations can close part of the gap to IFT, though not fully for multi-turn interactions. A systematic comparison reveals that ICL and IFT are nearly equivalent for single-turn tasks in the low-data regime, while IFT generalizes better to multi-turn conversations. The work provides actionable insights into when ICL is viable versus when fine-tuning remains superior, and releases code to facilitate replication and further exploration.
Abstract
In-context learning (ICL) allows LLMs to learn from examples without changing their weights: this is a particularly promising capability for long-context LLMs that can potentially learn from many examples. Recently, Lin et al. (2024) proposed URIAL, a method using only three in-context examples to align base LLMs, achieving non-trivial instruction following performance. In this work, we show that, while effective, ICL alignment with URIAL still underperforms compared to instruction fine-tuning on the established benchmark MT-Bench, especially with more capable base LLMs. We then uncover the most relevant elements for successful in-context alignment, finding the crucial role of the decoding parameters. Based on these insights, we show that the approach of URIAL can indeed be improved by adding high-quality, potentially carefully selected via greedy search, demonstrations in context, getting closer to the performance of instruct models. Finally, we provide the first, to our knowledge, systematic comparison of ICL and instruction fine-tuning (IFT) for instruction following in the low data regime, where ICL can be a viable alternative to IFT. Overall, our work advances the understanding of ICL as an alignment technique and its relationship to IFT. We provide our code at https://github.com/tml-epfl/icl-alignment.
