Table of Contents
Fetching ...

Task-Specific Knowledge Distillation via Intermediate Probes

Ryan Brown, Chris Russell

Abstract

Knowledge distillation from large language models (LLMs) assumes that the teacher's output distribution is a high-quality training signal. On reasoning tasks, this assumption is frequently violated. A model's intermediate representations may encode the correct answer, yet this information is lost or distorted through the vocabulary projection, where prompt formatting and answer-token choices creates brittle, noisy outputs. We introduce \method{}, a distillation framework that bypasses this bottleneck by training lightweight probes on frozen teacher hidden states and using the probe's predictions, rather than output logits, as supervision for student training. This simple change yields consistent improvements across four reasoning benchmarks (AQuA-RAT, ARC Easy/Challenge, and MMLU), with gains most pronounced under limited data. Probes trained on intermediate representations provide cleaner labels than the teacher's own outputs, effectively denoising the distillation signal. \method{} requires no architectural changes to student or teacher, is architecture-agnostic, and adds minimal compute since probe training is cheap and teacher representations can be cached. By exploiting internal representations, \method{} enables practitioners to extract more value from large teacher models without additional training data or architectural complexity.

Task-Specific Knowledge Distillation via Intermediate Probes

Abstract

Knowledge distillation from large language models (LLMs) assumes that the teacher's output distribution is a high-quality training signal. On reasoning tasks, this assumption is frequently violated. A model's intermediate representations may encode the correct answer, yet this information is lost or distorted through the vocabulary projection, where prompt formatting and answer-token choices creates brittle, noisy outputs. We introduce \method{}, a distillation framework that bypasses this bottleneck by training lightweight probes on frozen teacher hidden states and using the probe's predictions, rather than output logits, as supervision for student training. This simple change yields consistent improvements across four reasoning benchmarks (AQuA-RAT, ARC Easy/Challenge, and MMLU), with gains most pronounced under limited data. Probes trained on intermediate representations provide cleaner labels than the teacher's own outputs, effectively denoising the distillation signal. \method{} requires no architectural changes to student or teacher, is architecture-agnostic, and adds minimal compute since probe training is cheap and teacher representations can be cached. By exploiting internal representations, \method{} enables practitioners to extract more value from large teacher models without additional training data or architectural complexity.
Paper Structure (24 sections, 2 equations, 3 figures, 7 tables, 1 algorithm)

This paper contains 24 sections, 2 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: Probe-KD vs Typical Distillation. In standard logit distillation (left), soft labels come from the teacher's output layer, which projects hidden states onto answer tokens via a fixed, task-agnostic readout. This bottleneck produces noisy supervision even when the correct answer is encoded internally. Probe-KD (right) bypasses this bottleneck by training a probe to decode hidden states directly, learning a task-aligned projection. On AQuA-RAT, the probe achieves 52% accuracy versus the teacher's 45%, demonstrating that cleaner readouts yield cleaner labels which in turn yields more performant students.
  • Figure 2: Data efficiency comparison on AQuA-RAT. Test accuracy (%) as a function of training data percentage. We distill from Qwen2.5-7B-Instruct (teacher) to DeBERTa-v3-base (student, 86M parameters). Probe-KD variants consistently outperform standard distillation baselines across all data regimes, with gains most pronounced in low-data settings.
  • Figure 3: Probe-KD Calibration analysis on AQuA-RAT. We plot test accuracy against mean prediction confidence, with the x-axis reversed so that lower confidence (better calibration) appears rightward. Arrows indicate the distillation path from source (Teacher or MLP Probe) to student.