Table of Contents
Fetching ...

Arabic Stable LM: Adapting Stable LM 2 1.6B to Arabic

Zaid Alyafeai, Michael Pieler, Hannah Teufel, Jonathan Tow, Marco Bellagente, Duy Phung, Nikhil Pinnaparaju, Reshinth Adithyan, Paulo Rocha, Maksym Zhuravinskyi, Carlos Riquelme

TL;DR

This paper introduces Arabic Stable LM 1.6B in a base and chat version as a small but powerful Arabic-centric LLM and shows the benefit of mixing in synthetic instruction tuning data by augmenting the authors' fine-tuning data with a large synthetic dialogue dataset.

Abstract

Large Language Models (LLMs) have shown impressive results in multiple domains of natural language processing (NLP) but are mainly focused on the English language. Recently, more LLMs have incorporated a larger proportion of multilingual text to represent low-resource languages. In Arabic NLP, several Arabic-centric LLMs have shown remarkable results on multiple benchmarks in the past two years. However, most Arabic LLMs have more than 7 billion parameters, which increases their hardware requirements and inference latency, when compared to smaller LLMs. This paper introduces Arabic Stable LM 1.6B in a base and chat version as a small but powerful Arabic-centric LLM. Our Arabic Stable LM 1.6B chat model achieves impressive results on several benchmarks beating multiple models with up to 8x the parameters. In addition, we show the benefit of mixing in synthetic instruction tuning data by augmenting our fine-tuning data with a large synthetic dialogue dataset.

Arabic Stable LM: Adapting Stable LM 2 1.6B to Arabic

TL;DR

This paper introduces Arabic Stable LM 1.6B in a base and chat version as a small but powerful Arabic-centric LLM and shows the benefit of mixing in synthetic instruction tuning data by augmenting the authors' fine-tuning data with a large synthetic dialogue dataset.

Abstract

Large Language Models (LLMs) have shown impressive results in multiple domains of natural language processing (NLP) but are mainly focused on the English language. Recently, more LLMs have incorporated a larger proportion of multilingual text to represent low-resource languages. In Arabic NLP, several Arabic-centric LLMs have shown remarkable results on multiple benchmarks in the past two years. However, most Arabic LLMs have more than 7 billion parameters, which increases their hardware requirements and inference latency, when compared to smaller LLMs. This paper introduces Arabic Stable LM 1.6B in a base and chat version as a small but powerful Arabic-centric LLM. Our Arabic Stable LM 1.6B chat model achieves impressive results on several benchmarks beating multiple models with up to 8x the parameters. In addition, we show the benefit of mixing in synthetic instruction tuning data by augmenting our fine-tuning data with a large synthetic dialogue dataset.

Paper Structure

This paper contains 19 sections, 11 figures, 13 tables.

Figures (11)

  • Figure 1: Fertility scores of multiple tokenizers on the PADT and OSCAR datasets.
  • Figure 2: Different learning rate schedulers. The late cool down setup (red) consists of a warm up, cosine and inverse square root, and late cool down phase. The early cool down setup (blue) consists of a warm up, cosine and inverse square root, and early cool down phase.
  • Figure 3: ArabicMMLU benchmark differences between the cloze format (CF) and the multiple-choice format (MCF) of our Arabic Stable LM 1.6B base and chat models (ar-stablelm-2-base and ar-stablelm-2-chat) and other LLMs. Differences are calculated by subtracting the MCF results from the CF results. Our base and chat model are highlighted in green and yellow, respectively.
  • Figure 4: ArabicMMLU results for the early and late cool down learning rate scheduler. The early cool down starts at 120B tokens, and the late at 190B tokens.
  • Figure 5: ArabicMMLU benchmark results of the Arabic Stable LM 1.6B base model with the cloze format (CF) and multiple choice format (MCF).
  • ...and 6 more figures