DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models

Chengyu Wang; Junbing Yan; Yuanhao Yue; Jun Huang

DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models

Chengyu Wang, Junbing Yan, Yuanhao Yue, Jun Huang

TL;DR

The paper tackles the resource constraints of deploying large language models by distilling them into smaller, open-source variants. It introduces DistilQwen2.5, a two-stage pipeline that combines black-box, multi-agent data augmentation with CoT-guided rewriting and an efficient white-box model fusion that transfers knowledge from large teachers to smaller students. Evaluations on AlpacaEval 2.0, MT-Bench, and IFEval demonstrate significant instruction-following gains, with the largest improvements for compact backbones, and practical deployments such as SQL completion and cloud KD workflows. Overall, the work provides industrially viable strategies for constructing a spectrum of compact LLMs that achieve strong task performance while reducing inference costs, and it releases the DistilQwen2.5 family as open-source for broader impact.

Abstract

Enhancing computational efficiency and reducing deployment costs for large language models (LLMs) have become critical challenges in various resource-constrained scenarios. In this work, we present DistilQwen2.5, a family of distilled, lightweight LLMs derived from the public Qwen2.5 models. These distilled models exhibit enhanced instruction-following capabilities compared to the original models based on a series of distillation techniques that incorporate knowledge from much larger LLMs. In our industrial practice, we first leverage powerful proprietary LLMs with varying capacities as multi-agent teachers to select, rewrite, and refine instruction-response pairs that are more suitable for student LLMs to learn. After standard fine-tuning, we further leverage a computationally efficient model fusion approach that enables student models to progressively integrate fine-grained hidden knowledge from their teachers. Experimental evaluations demonstrate that the distilled models possess significantly stronger capabilities than their original checkpoints. Additionally, we present use cases to illustrate the applications of our framework in real-world scenarios. To facilitate practical use, we have released all the DistilQwen2.5 models to the open-source community.

DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models

TL;DR

Abstract

DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)