Table of Contents
Fetching ...

Devstral: Fine-tuning Language Models for Coding Agent Applications

Abhinav Rastogi, Adam Yang, Albert Q. Jiang, Alexander H. Liu, Alexandre Sablayrolles, Amélie Héliou, Amélie Martin, Anmol Agarwal, Andy Ehrenberg, Andy Lo, Antoine Roux, Arthur Darcet, Arthur Mensch, Baptiste Bout, Baptiste Rozière, Baudouin De Monicault, Chris Bamford, Christian Wallenwein, Christophe Renaudin, Clémence Lanfranchi, Clément Denoix, Corentin Barreau, Darius Dabert Devon Mizelle, Diego de las Casas, Elliot Chane-Sane, Emilien Fugier, Emma Bou Hanna, Gabrielle Berrada, Gauthier Delerce, Gauthier Guinet, Georgii Novikov, Graham Neubig, Guillaume Lample, Guillaume Martin, Himanshu Jaju, Jan Ludziejewski, Jason Rute, Jean-Malo Delignon, JeanHadrien Chabran, Joachim Studnia, Joep Barmentlo, Jonas Amar, Josselin Somerville Roberts, Julien Denize, Karan Saxena, Karmesh Yadav, Kartik Khandelwal, Khyathi Raghavi Chandu, Kush Jain, Lélio Renard Lavaud, Léonard Blier, Lingxiao Zhao, Louis Martin, Lucile Saulnier, Luyu Gao, Marie Pellat, Mathilde Guillaumin, Mathis Felardos, Matthieu Dinot, Maxime Darrin, Maximilian Augustin, Mickaël Seznec, Neha Gupta, Nikhil Raghuraman, Olivier Duchenne, Patricia Wang, Patrick von Platen, Patryk Saffer, Paul Jacob, Paul Wambergue, Paula Kurylowicz, Philomène Chagniot, Pierre Stock, Pravesh Agrawal, Rémi Delacourt, Roman Soletskyi, Romain Sauvestre, Sagar Vaze, Sanchit Gandhi, Sandeep Subramanian, Shashwat Dalal, Siddharth Gandhi, Soham Ghosh, Srijan Mishra, Sumukh Aithal, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Thibault Schueller, Thomas Foubert, Thomas Robert, Thomas Wang, Timothée Lacroix, Tom Bewley, Valeriia Nemychnikova, Victor Paltz, Virgile Richard, Wen-Ding Li, William Marshall, Xingyao Wang, Xuanyu Zhang, Yihan Wan, Yunhao Tang

TL;DR

Devstral-Small targets open-source code agents by delivering a compact $24$B Transformer with long-context capacity ($128k$ tokens) and agent-centric training. The approach combines a specialized data pipeline (SWE-Gym/OpenHands CodeAct), a two-stage post-training regime with strict filtering, and policy optimization to enable robust, multi-step coding tasks. Empirical results on SWE-bench show state-of-the-art performance among open models, with notable gains over larger baselines and effective behavior under an iterative evaluation protocol. The work further demonstrates the value of high-quality data through a Devstral-Small-2507 data refresh and highlights practical impact for on-device deployment in software engineering workflows.

Abstract

We introduce Devstral-Small, a lightweight open source model for code agents with the best performance among models below 100B size. In this technical report, we give an overview of how we design and develop a model and craft specializations in agentic software development. The resulting model, Devstral-Small is a small 24B model, fast and easy to serve. Despite its size, Devstral-Small still attains competitive performance compared to models more than an order of magnitude larger.

Devstral: Fine-tuning Language Models for Coding Agent Applications

TL;DR

Devstral-Small targets open-source code agents by delivering a compact B Transformer with long-context capacity ( tokens) and agent-centric training. The approach combines a specialized data pipeline (SWE-Gym/OpenHands CodeAct), a two-stage post-training regime with strict filtering, and policy optimization to enable robust, multi-step coding tasks. Empirical results on SWE-bench show state-of-the-art performance among open models, with notable gains over larger baselines and effective behavior under an iterative evaluation protocol. The work further demonstrates the value of high-quality data through a Devstral-Small-2507 data refresh and highlights practical impact for on-device deployment in software engineering workflows.

Abstract

We introduce Devstral-Small, a lightweight open source model for code agents with the best performance among models below 100B size. In this technical report, we give an overview of how we design and develop a model and craft specializations in agentic software development. The resulting model, Devstral-Small is a small 24B model, fast and easy to serve. Despite its size, Devstral-Small still attains competitive performance compared to models more than an order of magnitude larger.

Paper Structure

This paper contains 18 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Devstral-Small achieves state-of-the-art results among open models with the OpenHands scaffold. It outperforms models such as Qwen 3 235B and DeepSeek-V3 that are approximately 10 and 28 times larger respectively.
  • Figure 2: Devstral-Small compared to other model on any scaffold. We compare the performance of Devstral-Small to the reported SWE-Bench performance of GPT-4.1 mini and Claude 3.5 Haiku on custom scaffolds and to SWE-smith on the SWE-Agent scaffold.
  • Figure 3: Temperature scaling experiment visualization showing Pass@K performance trends on a logarithmic scale. Lower temperatures (blue) consistently improve with increased K, while higher temperatures (red/orange) exhibit more variable performance patterns.
  • Figure 4: Devstral-2507