In-context Learning vs. Instruction Tuning: The Case of Small and Multilingual Language Models

David Ponce; Thierry Etchegoyhen

In-context Learning vs. Instruction Tuning: The Case of Small and Multilingual Language Models

David Ponce, Thierry Etchegoyhen

TL;DR

This study compares in-context learning (ICL) and instruction tuning for multilingual and small language models, incorporating Direct Preference Optimisation (DPO) as a lightweight alignment method. Across English, French, and Spanish, and for models below 2B parameters, instruction tuning generally outperforms ICL, though URIAL offers meaningful gains over zero-shot and DPO further narrows the gap. The paper also presents a multilingual Just-Eval-Instruct dataset (translated into FR/ES) and analyzes critical errors such as infinite loops and unintended code generation, highlighting safety and reliability concerns. The findings underscore the need for broader evaluation and new alignment methods beyond standard instruction tuning to achieve robust multilingual instruction-following on smaller models.

Abstract

Instruction following is a critical ability for Large Language Models to perform downstream tasks. The standard approach to instruction alignment has relied on a specific phase of model tuning over curated instruction datasets, optionally complemented with an alignment step over human preferences. Recent work has shown the potential of in-context learning (ICL) alternatives to guide base models towards instruction following. This type of approach is particularly relevant to extend instruction following across languages and models of varying sizes adapted to different types of usage. In this work we compare ICL and instruction fine-tuning in English, French and Spanish, on Small Language Models, and provide experimental results on applying Direct Preference Optimisation (DPO) over base models. Our results show that scenarios involving multilingual and smaller models result in downgraded ICL instruction following performance, only partially mitigated by DPO alignment. This study aims to further our understanding of current strengths and limitations of alternative methods for instruction following.

In-context Learning vs. Instruction Tuning: The Case of Small and Multilingual Language Models

TL;DR

Abstract

In-context Learning vs. Instruction Tuning: The Case of Small and Multilingual Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)