Table of Contents
Fetching ...

Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability

Daniel Hendriks, Philipp Spitzer, Niklas Kühl, Gerhard Satzger

TL;DR

This work benchmarks knowledge distillation methods for shrinking large language models with a focus on both performance and explainability. It introduces critique-revision prompting for data generation and combines multitask and counterfactual training to form a standardized comparison framework, evaluated on the Commonsense Question-Answering dataset with a LLaMA-2 teacher and T5 students. Results show multitask training consistently delivers strong accuracy, while applying critique-revision prompting improves explainability, especially when combined with training methods. The study highlights that smaller teachers can approach the performance of much larger models under effective distillation and provides human-centered evaluation to guide practical deployment where explainability is critical.

Abstract

Artificial Intelligence (AI) has increasingly influenced modern society, recently in particular through significant advancements in Large Language Models (LLMs). However, high computational and storage demands of LLMs still limit their deployment in resource-constrained environments. Knowledge distillation addresses this challenge by training a small student model from a larger teacher model. Previous research has introduced several distillation methods for both generating training data and for training the student model. Despite their relevance, the effects of state-of-the-art distillation methods on model performance and explainability have not been thoroughly investigated and compared. In this work, we enlarge the set of available methods by applying critique-revision prompting to distillation for data generation and by synthesizing existing methods for training. For these methods, we provide a systematic comparison based on the widely used Commonsense Question-Answering (CQA) dataset. While we measure performance via student model accuracy, we employ a human-grounded study to evaluate explainability. We contribute new distillation methods and their comparison in terms of both performance and explainability. This should further advance the distillation of small language models and, thus, contribute to broader applicability and faster diffusion of LLM technology.

Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability

TL;DR

This work benchmarks knowledge distillation methods for shrinking large language models with a focus on both performance and explainability. It introduces critique-revision prompting for data generation and combines multitask and counterfactual training to form a standardized comparison framework, evaluated on the Commonsense Question-Answering dataset with a LLaMA-2 teacher and T5 students. Results show multitask training consistently delivers strong accuracy, while applying critique-revision prompting improves explainability, especially when combined with training methods. The study highlights that smaller teachers can approach the performance of much larger models under effective distillation and provides human-centered evaluation to guide practical deployment where explainability is critical.

Abstract

Artificial Intelligence (AI) has increasingly influenced modern society, recently in particular through significant advancements in Large Language Models (LLMs). However, high computational and storage demands of LLMs still limit their deployment in resource-constrained environments. Knowledge distillation addresses this challenge by training a small student model from a larger teacher model. Previous research has introduced several distillation methods for both generating training data and for training the student model. Despite their relevance, the effects of state-of-the-art distillation methods on model performance and explainability have not been thoroughly investigated and compared. In this work, we enlarge the set of available methods by applying critique-revision prompting to distillation for data generation and by synthesizing existing methods for training. For these methods, we provide a systematic comparison based on the widely used Commonsense Question-Answering (CQA) dataset. While we measure performance via student model accuracy, we employ a human-grounded study to evaluate explainability. We contribute new distillation methods and their comparison in terms of both performance and explainability. This should further advance the distillation of small language models and, thus, contribute to broader applicability and faster diffusion of LLM technology.

Paper Structure

This paper contains 19 sections, 12 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Research model overview: This work introduces novel methods for training data generation and training and establishes approaches to compare performance and explainability of student models.
  • Figure 2: Methods presented in the section on preliminaries are applied in two steps: In Step 1, we generate explanations and then improve them by critiquing and revising the explanation with the teacher. In step 2, the explanations are used to fine-tune student models with one of three training methods: multitask training, counterfactual training, or a combination of both. We focus on the four student models shown in the Table for two reasons. On the one hand, they are the most promising candidates in terms of both performance and explainability based on preliminary experiments. On the other hand, to avoid overloading study participants during explainability evaluation, we limit the number of possible student model combinations.
  • Figure 3: In the within-subject study ($N = 117$), we measure the effect of using four different student models on their explanation quality along five dimensions.
  • Figure 4: Comparison of different student models in terms of (a) accuracy performance and (b) explainability measured along the five dimensions of plausibility, understandability, completeness, satisfaction, and contrastiveness. "Quality" as a calculated concept represents the arithmetic average across the five dimensions. Abbreviations: CF for counterfactual training and MT for multitask training.