Table of Contents
Fetching ...

Evolving Knowledge Distillation with Large Language Models and Active Learning

Chengyuan Liu, Yangyang Kang, Fubang Zhao, Kun Kuang, Zhuoren Jiang, Changlong Sun, Fei Wu

TL;DR

EvoKD tackles the high cost of deploying large language models by distilling their task proficiency into small domain models through an active-learning inspired data-generation loop driven by LLMs. It actively identifies the student’s weaknesses, has the LLM generate diversified easy and hard samples with labels, and uses iterative feedback to refine the student, achieving strong 1-shot and few-shot results on text classification and NER. The approach combines weakness-aware guidance with data-stance diversification (easy vs hard, repeated batches, review history) to overcome static, offline KD pipelines and label noise issues, while maintaining data efficiency. The findings show EvoKD can reach up to 90% of full-shot performance with only 1-shot on several datasets, highlighting its practical impact for low-resource scenarios and professional-domain tasks.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks. However, their computational costs are prohibitively high. To address this issue, previous research has attempted to distill the knowledge of LLMs into smaller models by generating annotated data. Nonetheless, these works have mainly focused on the direct use of LLMs for text generation and labeling, without fully exploring their potential to comprehend the target task and acquire valuable knowledge. In this paper, we propose EvoKD: Evolving Knowledge Distillation, which leverages the concept of active learning to interactively enhance the process of data generation using large language models, simultaneously improving the task capabilities of small domain model (student model). Different from previous work, we actively analyze the student model's weaknesses, and then synthesize labeled samples based on the analysis. In addition, we provide iterative feedback to the LLMs regarding the student model's performance to continuously construct diversified and challenging samples. Experiments and analysis on different NLP tasks, namely, text classification and named entity recognition show the effectiveness of EvoKD.

Evolving Knowledge Distillation with Large Language Models and Active Learning

TL;DR

EvoKD tackles the high cost of deploying large language models by distilling their task proficiency into small domain models through an active-learning inspired data-generation loop driven by LLMs. It actively identifies the student’s weaknesses, has the LLM generate diversified easy and hard samples with labels, and uses iterative feedback to refine the student, achieving strong 1-shot and few-shot results on text classification and NER. The approach combines weakness-aware guidance with data-stance diversification (easy vs hard, repeated batches, review history) to overcome static, offline KD pipelines and label noise issues, while maintaining data efficiency. The findings show EvoKD can reach up to 90% of full-shot performance with only 1-shot on several datasets, highlighting its practical impact for low-resource scenarios and professional-domain tasks.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks. However, their computational costs are prohibitively high. To address this issue, previous research has attempted to distill the knowledge of LLMs into smaller models by generating annotated data. Nonetheless, these works have mainly focused on the direct use of LLMs for text generation and labeling, without fully exploring their potential to comprehend the target task and acquire valuable knowledge. In this paper, we propose EvoKD: Evolving Knowledge Distillation, which leverages the concept of active learning to interactively enhance the process of data generation using large language models, simultaneously improving the task capabilities of small domain model (student model). Different from previous work, we actively analyze the student model's weaknesses, and then synthesize labeled samples based on the analysis. In addition, we provide iterative feedback to the LLMs regarding the student model's performance to continuously construct diversified and challenging samples. Experiments and analysis on different NLP tasks, namely, text classification and named entity recognition show the effectiveness of EvoKD.
Paper Structure (34 sections, 5 figures, 6 tables, 1 algorithm)

This paper contains 34 sections, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Relationship between KD, DA and DG.
  • Figure 2: Framework of EvoKD. The initial student model is trained using the few-shot training data. Then, both correct and wrong samples are identified via "Evaluation" step. Iteratively, the identification results are used to distill the new samples. For Evolving Knowledge Distillation, the process begins by prompting the LLM to analyse the weakness of the student model, given the correct and wrong samples. Based on the weakness, the LLM is required to generate a set of challenging and easy samples, which are collected to construct a batch data. The batch data firstly evaluates the student model to obtain the next feedback, then the student model is trained on the batch data.
  • Figure 3: F1 versus the number of tokens used during training.
  • Figure 4: EvoKD concentrates on the samples with lower performance. Note that the upper sub-figure shows the accumulate number of samples of each category. A rising trending means the LLM generates more samples of the category, while a stage of horizontal line indicates that the category is absent in the generated data.
  • Figure 5: Example of evolving active learning of sentiment classification task.