Table of Contents
Fetching ...

Command R7B Arabic: A Small, Enterprise Focused, Multilingual, and Culturally Aware Arabic LLM

Yazeed Alnumay, Alexandre Barbet, Anna Bialas, William Darling, Shaan Desai, Joan Devassy, Kyle Duffy, Stephanie Howe, Olivia Lasche, Justin Lee, Anirudh Shrinivason, Jennifer Tracey

TL;DR

The paper tackles the challenge of deploying high-quality enterprise Arabic LLMs in the face of limited digitized data. It proposes a data-synthesis pipeline with a human-in-the-loop and an iterative post-training recipe, augmented by expert-model merging to efficiently specialize a compact 7B Arabic system. The resulting Command R7B Arabic outperforms similarly sized peers on key Arabic benchmarks, particularly in instruction following, RAG, and cultural knowledge, while preserving broad capabilities. This work offers a practical, scalable approach for building accessible Arabic NLP systems in enterprise settings and contributes a release-ready, open-weight model to the community.

Abstract

Building high-quality large language models (LLMs) for enterprise Arabic applications remains challenging due to the limited availability of digitized Arabic data. In this work, we present a data synthesis and refinement strategy to help address this problem, namely, by leveraging synthetic data generation and human-in-the-loop annotation to expand our Arabic training corpus. We further present our iterative post training recipe that is essential to achieving state-of-the-art performance in aligning the model with human preferences, a critical aspect to enterprise use cases. The culmination of this effort is the release of a small, 7B, open-weight model that outperforms similarly sized peers in head-to-head comparisons and on Arabic-focused benchmarks covering cultural knowledge, instruction following, RAG, and contextual faithfulness.

Command R7B Arabic: A Small, Enterprise Focused, Multilingual, and Culturally Aware Arabic LLM

TL;DR

The paper tackles the challenge of deploying high-quality enterprise Arabic LLMs in the face of limited digitized data. It proposes a data-synthesis pipeline with a human-in-the-loop and an iterative post-training recipe, augmented by expert-model merging to efficiently specialize a compact 7B Arabic system. The resulting Command R7B Arabic outperforms similarly sized peers on key Arabic benchmarks, particularly in instruction following, RAG, and cultural knowledge, while preserving broad capabilities. This work offers a practical, scalable approach for building accessible Arabic NLP systems in enterprise settings and contributes a release-ready, open-weight model to the community.

Abstract

Building high-quality large language models (LLMs) for enterprise Arabic applications remains challenging due to the limited availability of digitized Arabic data. In this work, we present a data synthesis and refinement strategy to help address this problem, namely, by leveraging synthetic data generation and human-in-the-loop annotation to expand our Arabic training corpus. We further present our iterative post training recipe that is essential to achieving state-of-the-art performance in aligning the model with human preferences, a critical aspect to enterprise use cases. The culmination of this effort is the release of a small, 7B, open-weight model that outperforms similarly sized peers in head-to-head comparisons and on Arabic-focused benchmarks covering cultural knowledge, instruction following, RAG, and contextual faithfulness.

Paper Structure

This paper contains 14 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Evaluations on enterprise usability factors (mArenaHard, described in \ref{['sec:results']}). Auto win-rates on Arabic version of LMSYS Arena "Hard" human preference tasks aya_expanse. Command R7B Arabic outperforms all listed similarly-sized models.
  • Figure 2: Outline of Command R7B Arabic's training processes with three training stages, each training multiple experts that are merged into a single general model. For instance, in the SFT stage, multiple SFT expert models are trained to excel in specific domains, such as mathematics or instruction following. These experts are subsequently merged to create a generalist SFT model via parameter-wise linear interpolation of the experts' weights.
  • Figure 3: Flowchart for our iterative supervised refinement approach. It ensures that all datasets used improve targeted model performance by mixing a base data mixture with a targeted dataset that is iteratively improved via multilingual arbitrage.