Table of Contents
Fetching ...

PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-training

Rongjie Yi, Xiang Li, Weikai Xie, Zhenyan Lu, Chenghua Wang, Ao Zhou, Shangguang Wang, Xiwen Zhang, Mengwei Xu

TL;DR

This work presents a simple yet effective principle for SLM design: architecture searching for (near-)optimal runtime efficiency before pre-training, and develops PhoneLM SLM family, that acheive the state-of-the-art capability-efficiency tradeoff among those with similar parameter size.

Abstract

The interest in developing small language models (SLM) for on-device deployment is fast growing. However, the existing SLM design hardly considers the device hardware characteristics. Instead, this work presents a simple yet effective principle for SLM design: architecture searching for (near-)optimal runtime efficiency before pre-training. Guided by this principle, we develop PhoneLM SLM family (currently with 0.5B and 1.5B versions), that acheive the state-of-the-art capability-efficiency tradeoff among those with similar parameter size. We fully open-source the code, weights, and training datasets of PhoneLM for reproducibility and transparency, including both base and instructed versions. We also release a finetuned version of PhoneLM capable of accurate Android Intent invocation, and an end-to-end Android demo. All materials are available at https://github.com/UbiquitousLearning/PhoneLM.

PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-training

TL;DR

This work presents a simple yet effective principle for SLM design: architecture searching for (near-)optimal runtime efficiency before pre-training, and develops PhoneLM SLM family, that acheive the state-of-the-art capability-efficiency tradeoff among those with similar parameter size.

Abstract

The interest in developing small language models (SLM) for on-device deployment is fast growing. However, the existing SLM design hardly considers the device hardware characteristics. Instead, this work presents a simple yet effective principle for SLM design: architecture searching for (near-)optimal runtime efficiency before pre-training. Guided by this principle, we develop PhoneLM SLM family (currently with 0.5B and 1.5B versions), that acheive the state-of-the-art capability-efficiency tradeoff among those with similar parameter size. We fully open-source the code, weights, and training datasets of PhoneLM for reproducibility and transparency, including both base and instructed versions. We also release a finetuned version of PhoneLM capable of accurate Android Intent invocation, and an end-to-end Android demo. All materials are available at https://github.com/UbiquitousLearning/PhoneLM.

Paper Structure

This paper contains 17 sections, 2 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: An end-to-end Android demo of PhoneLM's capability. (a) shows an example of a user having a conversation with PhoneLM-1.5B-Instruct; (b) shows an example of a user invokes an Android intent through chatting with PhoneLM-1.5B-Call.
  • Figure 2: The comparison of the average qccuracy and runtime performance between PhoneLM-1.5B and SLMs with similar parameter quantities (1B to 2B). The average accuracy select seven NLP tasks to reflect the ability of the models (same as table reftab:performance), and the prefill/decode throughput is tested using the CPU on the Xiaomi 14 mobile phone. The closer the model is to the upper right corner, the better it is. Solid dots represent that the training data of the model is open source, and hollow dots represent that the training data of the model is closed source.
  • Figure 3: The comparison of the throughput and ability of the models with parameter quantities of 100M and 200M. More details of these model architecture are shown in appendix \ref{['sec:appendix-100200-setting']}
  • Figure 4: Training loss
  • Figure 5: PhoneLM's performance across training iterations on standard zero-shot tasks
  • ...and 1 more figures