LLM on a Budget: Active Knowledge Distillation for Efficient Classification of Large Text Corpora
Viviana Luccioli, Rithika Iyengar, Ryan Panley, Flora Haberkorn, Xiaoyu Ge, Leland Crane, Nitish Sinha, Seung Jung Lee
TL;DR
The paper tackles the cost-barrier of deploying large language models for text classification by fusing knowledge distillation with active learning. It introduces M-RARU, a multi-class randomized accept/reject uncertainty sampling strategy, to select only the most informative unlabeled examples for LLM labeling, thereby training lightweight student models with far fewer API calls. Across two real-world datasets and five student architectures, M-RARU consistently outperforms random sampling and achieves substantial gains in labeling efficiency, with up to 80% fewer labeled samples needed and notable improvements in accuracy and balanced accuracy. The approach combines embedding-based representations, uncertainty-driven querying, and interpretable downstream models to enable fast, cost-effective deployment of LLM-informed classifiers in resource-constrained settings.
Abstract
Large Language Models (LLMs) are highly accurate in classification tasks, however, substantial computational and financial costs hinder their large-scale deployment in dynamic environments. Knowledge Distillation (KD) where a LLM "teacher" trains a smaller and more efficient "student" model, offers a promising solution to this problem. However, the distillation process itself often remains costly for large datasets, since it requires the teacher to label a vast number of samples while incurring significant token consumption. To alleviate this challenge, in this work we explore the active learning (AL) as a way to create efficient student models at a fraction of the cost while preserving the LLM's performance. In particular, we introduce M-RARU (Multi-class Randomized Accept/Reject Uncertainty Sampling), a novel AL algorithm that significantly reduces training costs. M-RARU employs an innovative strategy combining uncertainty with a randomized accept-reject mechanism to select only the most informative data points for the LLM teacher. This focused approach significantly minimizes required API calls and data processing time. We evaluate M-RARU against random sampling across five diverse student models (SVM, LDA, RF, GBDT, and DistilBERT) on multiple benchmark datasets. Experiments demonstrate that our proposed method achieves up to 80% reduction in sample requirements as compared to random sampling, substantially improving classification accuracy while reducing financial costs and overall training time.
