Table of Contents
Fetching ...

Mini-Giants: "Small" Language Models and Open Source Win-Win

Zhengping Zhou, Lezhi Li, Xinxi Chen, Andy Li

TL;DR

This article argues that 'mini-giants'—open-source, instruction-following LLMs with around 10B parameters or fewer—offer a practical alternative to giant proprietary models due to adaptability, controllability, and affordability. It surveys parameter-reduction strategies (Chinchilla, LLaMA) and parameter-efficient fine-tuning methods (Adapter, Prefix, LoRA, QLoRA, ControlNet), and catalogs a spectrum of open-source LMs trained on synthetic or human-curated data (Alpaca, Vicuna, Dolly, Guanaco, Open Assistant). It emphasizes evaluation challenges and real-world applications emphasizing privacy and local computation, illustrated by CBT applications like Woebot. The paper concludes that open-source mini-giants can democratize AI access, enabling domain-specific adaptation and safer, governance-friendly deployments.

Abstract

ChatGPT is phenomenal. However, it is prohibitively expensive to train and refine such giant models. Fortunately, small language models are flourishing and becoming more and more competent. We call them "mini-giants". We argue that open source community like Kaggle and mini-giants will win-win in many ways, technically, ethically and socially. In this article, we present a brief yet rich background, discuss how to attain small language models, present a comparative study of small language models and a brief discussion of evaluation methods, discuss the application scenarios where small language models are most needed in the real world, and conclude with discussion and outlook.

Mini-Giants: "Small" Language Models and Open Source Win-Win

TL;DR

This article argues that 'mini-giants'—open-source, instruction-following LLMs with around 10B parameters or fewer—offer a practical alternative to giant proprietary models due to adaptability, controllability, and affordability. It surveys parameter-reduction strategies (Chinchilla, LLaMA) and parameter-efficient fine-tuning methods (Adapter, Prefix, LoRA, QLoRA, ControlNet), and catalogs a spectrum of open-source LMs trained on synthetic or human-curated data (Alpaca, Vicuna, Dolly, Guanaco, Open Assistant). It emphasizes evaluation challenges and real-world applications emphasizing privacy and local computation, illustrated by CBT applications like Woebot. The paper concludes that open-source mini-giants can democratize AI access, enabling domain-specific adaptation and safer, governance-friendly deployments.

Abstract

ChatGPT is phenomenal. However, it is prohibitively expensive to train and refine such giant models. Fortunately, small language models are flourishing and becoming more and more competent. We call them "mini-giants". We argue that open source community like Kaggle and mini-giants will win-win in many ways, technically, ethically and socially. In this article, we present a brief yet rich background, discuss how to attain small language models, present a comparative study of small language models and a brief discussion of evaluation methods, discuss the application scenarios where small language models are most needed in the real world, and conclude with discussion and outlook.
Paper Structure (44 sections, 1 equation, 1 figure, 4 tables)

This paper contains 44 sections, 1 equation, 1 figure, 4 tables.

Figures (1)

  • Figure 1: An evolution tree of recently released instruction-following small LMs. The color of the text boxes indicates the openness of the license under which the models are released: red stands for proprietary licenses, yellow stands for non-commercial licenses, and green stands for licenses permissive for commercial use.