In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning

Xiaochuang Han

In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning

Xiaochuang Han

TL;DR

The paper demonstrates that a vanilla pretrained language model (Llama-2-vanilla) can effectively follow chat-style instructions at inference without fine-tuning by performing in-context alignment. By retrieving an average of about $9$ demonstrations from Open Assistant data using Contriever and presenting them beside the test prompt, the model’s win-rate against OpenAI's text-davinci-003 improves from about $11.4\%$ to $78.4\%$, effectively matching some alignment-finetuned baselines. The approach highlights efficiency, interpretability, and deployment simplicity since only prompts and demonstrations are involved at inference, not weight updates. Ablation analyses show that strong base models, adequate context length, and targeted retrieval are crucial, while qualitative examples discuss both successes and limitations, including safety considerations. Overall, in-context alignment provides a practical, scalable alternative to finetuning for aligning vanilla LMs with instruction-following objectives, with potential for rapid evaluation and configurable alignment styles. Open questions include integrating RLHF within this framework and extending to multi-turn or long-context dialogues.

Abstract

In this note, we explore inference-time alignment through in-context learning. We consider a vanilla pretrained language model Llama-2 before any fine-tuning and retrieve an average of 9 demonstration alignment examples when the model is prompted to follow chat-style instructions. Compared to direct prompting, the in-context alignment without changing model weights leads to a 7x increase in win-rate w.r.t. the text-davinci-003 model from OpenAI, making the vanilla language model comparable to strong baselines with alignment fine-tuning.

In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning

TL;DR

demonstrations from Open Assistant data using Contriever and presenting them beside the test prompt, the model’s win-rate against OpenAI's text-davinci-003 improves from about

, effectively matching some alignment-finetuned baselines. The approach highlights efficiency, interpretability, and deployment simplicity since only prompts and demonstrations are involved at inference, not weight updates. Ablation analyses show that strong base models, adequate context length, and targeted retrieval are crucial, while qualitative examples discuss both successes and limitations, including safety considerations. Overall, in-context alignment provides a practical, scalable alternative to finetuning for aligning vanilla LMs with instruction-following objectives, with potential for rapid evaluation and configurable alignment styles. Open questions include integrating RLHF within this framework and extending to multi-turn or long-context dialogues.

Abstract

Paper Structure (14 sections, 1 equation, 1 figure, 4 tables)

This paper contains 14 sections, 1 equation, 1 figure, 4 tables.

Background
In-Context Alignment
Setup
Vanilla language model $\theta$
Alignment data $D$
Retriever $R$
Test prompt $p$
Demonstration template
Sampling strategy
Results
Main evaluation
Ablation
Qualitative examples
Implications

Figures (1)

Figure 1: In-context alignment with vanilla pretrained Llama-2 before any fine-tuning. Compared to direct prompting, retrieving an average of 9 alignment demonstrations at inference time (among 9K candidate data) leads to a 7x win rate w.r.t. OpenAI's text-davinci-003 in our evaluation.

In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning

TL;DR

Abstract

In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (1)