Table of Contents
Fetching ...

Leveraging Zero-Shot Prompting for Efficient Language Model Distillation

Lukas Vöge, Vincent Gurgul, Stefan Lessmann

TL;DR

This paper introduces a novel approach for efficiently distilling LLMs into smaller, application-specific models, significantly reducing operational costs and manual labor and investigates the impact of explanation properties on distillation efficiency.

Abstract

This paper introduces a novel approach for efficiently distilling LLMs into smaller, application-specific models, significantly reducing operational costs and manual labor. Addressing the challenge of deploying computationally intensive LLMs in specific applications or edge devices, this technique utilizes LLMs' reasoning capabilities to generate labels and natural language rationales for unlabeled data. Our approach enhances both finetuning and distillation by employing a multi-task training framework where student models mimic these rationales alongside teacher predictions. Key contributions include the employment of zero-shot prompting to elicit teacher model rationales, reducing the necessity for handcrafted few-shot examples and lowering the overall token count required, which directly translates to cost savings given the pay-per-token billing model of major tech companies' LLM APIs. Additionally, the paper investigates the impact of explanation properties on distillation efficiency, demonstrating that minimal performance loss occurs even when rationale augmentation is not applied across the entire dataset, facilitating further reductions of tokens. This research marks a step toward the efficient training of task-specific models with minimal human intervention, offering substantial cost-savings while maintaining, or even enhancing, performance.

Leveraging Zero-Shot Prompting for Efficient Language Model Distillation

TL;DR

This paper introduces a novel approach for efficiently distilling LLMs into smaller, application-specific models, significantly reducing operational costs and manual labor and investigates the impact of explanation properties on distillation efficiency.

Abstract

This paper introduces a novel approach for efficiently distilling LLMs into smaller, application-specific models, significantly reducing operational costs and manual labor. Addressing the challenge of deploying computationally intensive LLMs in specific applications or edge devices, this technique utilizes LLMs' reasoning capabilities to generate labels and natural language rationales for unlabeled data. Our approach enhances both finetuning and distillation by employing a multi-task training framework where student models mimic these rationales alongside teacher predictions. Key contributions include the employment of zero-shot prompting to elicit teacher model rationales, reducing the necessity for handcrafted few-shot examples and lowering the overall token count required, which directly translates to cost savings given the pay-per-token billing model of major tech companies' LLM APIs. Additionally, the paper investigates the impact of explanation properties on distillation efficiency, demonstrating that minimal performance loss occurs even when rationale augmentation is not applied across the entire dataset, facilitating further reductions of tokens. This research marks a step toward the efficient training of task-specific models with minimal human intervention, offering substantial cost-savings while maintaining, or even enhancing, performance.
Paper Structure (14 sections, 1 equation, 11 figures, 4 tables)

This paper contains 14 sections, 1 equation, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Step-by-step distillation as proposed by hsieh_2023_distilling
  • Figure 2: Overview of the proposed zero-shot step-by-step distillation
  • Figure 3: Overview of the OPRO framework by yang_2023_large
  • Figure 4: OPRO progression on ANLI1
  • Figure 5: Evaluation accuracy of finetuned student models on varying training set sizes
  • ...and 6 more figures