Table of Contents
Fetching ...

Luth: Efficient French Specialization for Small Language Models and Cross-Lingual Transfer

Maxence Lasbordes, Sinoué Gad

TL;DR

The paper targets the English-centric bias of large language models by introducing Luth, a family of French-specialized small language models (SLMs) built via targeted post-training on a high-quality French dataset (Luth-SFT). It demonstrates that this approach, especially when combined with model merging, yields state-of-the-art French performance within their size class while preserving or even enhancing English capabilities through cross-lingual transfer. Key contributions include the Luth-SFT dataset (570k French instruction–response samples, ~338M tokens) and five models (350M–1.7B params), along with a reproducible adaptation workflow and merging strategies (LERP/SLERP) that mitigate forgetting. The results show French gains up to +11.26 percentage points across six benchmarks and robust English performance, establishing Luth as a strong baseline for future French-language research and a scalable template for other languages and larger models.

Abstract

The landscape of Large Language Models (LLMs) remains predominantly English-centric, resulting in a significant performance gap for other major languages, such as French, especially in the context of Small Language Models (SLMs). Existing multilingual models demonstrate considerably lower performance in French compared to English, and research on efficient adaptation methods for French remains limited. To address this, we introduce \textbf{Luth}, a family of French-specialized SLMs: through targeted post-training on curated, high-quality French data, our models outperform all open-source counterparts of comparable size on multiple French benchmarks while retaining their original English capabilities. We further show that strategic model merging enhances performance in both languages, establishing Luth as a new state of the art for French SLMs and a robust baseline for future French-language research.

Luth: Efficient French Specialization for Small Language Models and Cross-Lingual Transfer

TL;DR

The paper targets the English-centric bias of large language models by introducing Luth, a family of French-specialized small language models (SLMs) built via targeted post-training on a high-quality French dataset (Luth-SFT). It demonstrates that this approach, especially when combined with model merging, yields state-of-the-art French performance within their size class while preserving or even enhancing English capabilities through cross-lingual transfer. Key contributions include the Luth-SFT dataset (570k French instruction–response samples, ~338M tokens) and five models (350M–1.7B params), along with a reproducible adaptation workflow and merging strategies (LERP/SLERP) that mitigate forgetting. The results show French gains up to +11.26 percentage points across six benchmarks and robust English performance, establishing Luth as a strong baseline for future French-language research and a scalable template for other languages and larger models.

Abstract

The landscape of Large Language Models (LLMs) remains predominantly English-centric, resulting in a significant performance gap for other major languages, such as French, especially in the context of Small Language Models (SLMs). Existing multilingual models demonstrate considerably lower performance in French compared to English, and research on efficient adaptation methods for French remains limited. To address this, we introduce \textbf{Luth}, a family of French-specialized SLMs: through targeted post-training on curated, high-quality French data, our models outperform all open-source counterparts of comparable size on multiple French benchmarks while retaining their original English capabilities. We further show that strategic model merging enhances performance in both languages, establishing Luth as a new state of the art for French SLMs and a robust baseline for future French-language research.

Paper Structure

This paper contains 21 sections, 2 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Overview of the four main stages in constructing the Luth models, including their substeps, methods, and frameworks.
  • Figure 2: Overview of the Luth-SFT dataset construction pipeline, from data collection and translation to filtering and the Scholar subset creation.
  • Figure 3: Loss per step during full fine-tuning on the Luth-SFT dataset over 3 epochs for Qwen3-0.6B (green) and Qwen3-1.7B (blue).
  • Figure 4: Performance comparison of the Luth models in their base form (e.g., Qwen3-0.6B), after fine-tuning (e.g., Qwen3-0.6B fine-tuned), and after merging (e.g., Luth-0.6B-Instruct), averaged over four French/English benchmarks: IFEval, MMLU, GPQA-Diamond, and Math500. Left panel shows English performance, right panel shows French performance.