Table of Contents
Fetching ...

Xmodel-LM Technical Report

Yichuan Wang, Yang Liu, Yu Yan, Qun Wang, Xucheng Huang, Ling Jiang

TL;DR

This work introduces Xmodel-LM, a compact and efficient 1.1B language model pre-trained on around 2 trillion tokens that notably surpasses existing open-source language models of similar scale.

Abstract

We introduce Xmodel-LM, a compact and efficient 1.1B language model pre-trained on around 2 trillion tokens. Trained on our self-built dataset (Xdata), which balances Chinese and English corpora based on downstream task optimization, Xmodel-LM exhibits remarkable performance despite its smaller size. It notably surpasses existing open-source language models of similar scale. Our model checkpoints and code are publicly accessible on GitHub at https://github.com/XiaoduoAILab/XmodelLM.

Xmodel-LM Technical Report

TL;DR

This work introduces Xmodel-LM, a compact and efficient 1.1B language model pre-trained on around 2 trillion tokens that notably surpasses existing open-source language models of similar scale.

Abstract

We introduce Xmodel-LM, a compact and efficient 1.1B language model pre-trained on around 2 trillion tokens. Trained on our self-built dataset (Xdata), which balances Chinese and English corpora based on downstream task optimization, Xmodel-LM exhibits remarkable performance despite its smaller size. It notably surpasses existing open-source language models of similar scale. Our model checkpoints and code are publicly accessible on GitHub at https://github.com/XiaoduoAILab/XmodelLM.
Paper Structure (11 sections, 1 equation, 3 figures, 6 tables)

This paper contains 11 sections, 1 equation, 3 figures, 6 tables.

Figures (3)

  • Figure 1: The trend of training and validation loss during pretraining.
  • Figure 2: Evolution of performance in commonsense reasoning tasks during pre-training
  • Figure 3: Shifts in the $L_2$-norm of parameters during pre-training