Table of Contents
Fetching ...

Stable Code Technical Report

Nikhil Pinnaparaju, Reshinth Adithyan, Duy Phung, Jonathan Tow, James Baicoianu, Ashish Datta, Maksym Zhuravinskyi, Dakota Mahan, Marco Bellagente, Carlos Riquelme, Nathan Cooper

TL;DR

Stable Code introduces compact decoder-only code language models (3B) and an instruction-tuned variant (Stable Code Instruct) that perform competitively on code completion, reasoning, and multi-turn dialogue benchmarks while remaining efficient for edge deployment. The authors detail a diverse pretraining corpus, synthetic data via Evol-Instruct, and a long-context dataset, combined with a multi-stage training pipeline and a Fill-in-the-Middle objective to improve non-prefix conditioning. Fine-tuning uses publicly available instruction data and DPO with safety datasets, yielding strong results on Multi-PL, MT-Bench, and SQL-Eval benchmarks, often matching or approaching larger models. The work also provides quantized checkpoints and discusses inference throughput on edge devices, highlighting practical deployment benefits and open-source contribution to code-language modeling research.

Abstract

We introduce Stable Code, the first in our new-generation of code language models series, which serves as a general-purpose base code language model targeting code completion, reasoning, math, and other software engineering-based tasks. Additionally, we introduce an instruction variant named Stable Code Instruct that allows conversing with the model in a natural chat interface for performing question-answering and instruction-based tasks. In this technical report, we detail the data and training procedure leading to both models. Their weights are available via Hugging Face for anyone to download and use at https://huggingface.co/stabilityai/stable-code-3b and https://huggingface.co/stabilityai/stable-code-instruct-3b. This report contains thorough evaluations of the models, including multilingual programming benchmarks, and the MT benchmark focusing on multi-turn dialogues. At the time of its release, Stable Code is the state-of-the-art open model under 3B parameters and even performs comparably to larger models of sizes 7 billion and 15 billion parameters on the popular Multi-PL benchmark. Stable Code Instruct also exhibits state-of-the-art performance on the MT-Bench coding tasks and on Multi-PL completion compared to other instruction tuned models. Given its appealing small size, we also provide throughput measurements on a number of edge devices. In addition, we open source several quantized checkpoints and provide their performance metrics compared to the original model.

Stable Code Technical Report

TL;DR

Stable Code introduces compact decoder-only code language models (3B) and an instruction-tuned variant (Stable Code Instruct) that perform competitively on code completion, reasoning, and multi-turn dialogue benchmarks while remaining efficient for edge deployment. The authors detail a diverse pretraining corpus, synthetic data via Evol-Instruct, and a long-context dataset, combined with a multi-stage training pipeline and a Fill-in-the-Middle objective to improve non-prefix conditioning. Fine-tuning uses publicly available instruction data and DPO with safety datasets, yielding strong results on Multi-PL, MT-Bench, and SQL-Eval benchmarks, often matching or approaching larger models. The work also provides quantized checkpoints and discusses inference throughput on edge devices, highlighting practical deployment benefits and open-source contribution to code-language modeling research.

Abstract

We introduce Stable Code, the first in our new-generation of code language models series, which serves as a general-purpose base code language model targeting code completion, reasoning, math, and other software engineering-based tasks. Additionally, we introduce an instruction variant named Stable Code Instruct that allows conversing with the model in a natural chat interface for performing question-answering and instruction-based tasks. In this technical report, we detail the data and training procedure leading to both models. Their weights are available via Hugging Face for anyone to download and use at https://huggingface.co/stabilityai/stable-code-3b and https://huggingface.co/stabilityai/stable-code-instruct-3b. This report contains thorough evaluations of the models, including multilingual programming benchmarks, and the MT benchmark focusing on multi-turn dialogues. At the time of its release, Stable Code is the state-of-the-art open model under 3B parameters and even performs comparably to larger models of sizes 7 billion and 15 billion parameters on the popular Multi-PL benchmark. Stable Code Instruct also exhibits state-of-the-art performance on the MT-Bench coding tasks and on Multi-PL completion compared to other instruction tuned models. Given its appealing small size, we also provide throughput measurements on a number of edge devices. In addition, we open source several quantized checkpoints and provide their performance metrics compared to the original model.
Paper Structure (25 sections, 3 figures, 2 tables)

This paper contains 25 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Staged approach to training Stable Code 3B and Stable Code Instruct 3B.
  • Figure 2: Stable Code 3B Loss and Learning Rate Curves.
  • Figure 3: Code Performance Comparison of Stable Code 3B Scratch and Stable LM 3B Initializations