In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning
Xiaochuang Han
TL;DR
The paper demonstrates that a vanilla pretrained language model (Llama-2-vanilla) can effectively follow chat-style instructions at inference without fine-tuning by performing in-context alignment. By retrieving an average of about $9$ demonstrations from Open Assistant data using Contriever and presenting them beside the test prompt, the model’s win-rate against OpenAI's text-davinci-003 improves from about $11.4\%$ to $78.4\%$, effectively matching some alignment-finetuned baselines. The approach highlights efficiency, interpretability, and deployment simplicity since only prompts and demonstrations are involved at inference, not weight updates. Ablation analyses show that strong base models, adequate context length, and targeted retrieval are crucial, while qualitative examples discuss both successes and limitations, including safety considerations. Overall, in-context alignment provides a practical, scalable alternative to finetuning for aligning vanilla LMs with instruction-following objectives, with potential for rapid evaluation and configurable alignment styles. Open questions include integrating RLHF within this framework and extending to multi-turn or long-context dialogues.
Abstract
In this note, we explore inference-time alignment through in-context learning. We consider a vanilla pretrained language model Llama-2 before any fine-tuning and retrieve an average of 9 demonstration alignment examples when the model is prompted to follow chat-style instructions. Compared to direct prompting, the in-context alignment without changing model weights leads to a 7x increase in win-rate w.r.t. the text-davinci-003 model from OpenAI, making the vanilla language model comparable to strong baselines with alignment fine-tuning.
