Table of Contents
Fetching ...

Black-Box On-Policy Distillation of Large Language Models

Tianzhu Ye, Li Dong, Zewen Chi, Xun Wu, Shaohan Huang, Furu Wei

TL;DR

The paper tackles black-box large language model distillation by introducing Generative Adversarial Distillation (GAD), which enables on-policy learning without access to teacher logits or parameters. By framing the student as a generator and a continually adapting discriminator as an on-policy reward model, GAD forms a minimax game that yields stable feedback and better generalization than traditional SeqKD. Empirical results across multiple teacher-student pairs and datasets show GAD matching or approaching teacher performance, with notable gains in out-of-distribution scenarios and robust human evaluations. The work demonstrates that joint, on-policy adversarial training can effectively compress closed-source LLMs while preserving global stylistic and reasoning capabilities, offering a practical approach for black-box distillation in real-world settings.

Abstract

Black-box distillation creates student large language models (LLMs) by learning from a proprietary teacher model's text outputs alone, without access to its internal logits or parameters. In this work, we introduce Generative Adversarial Distillation (GAD), which enables on-policy and black-box distillation. GAD frames the student LLM as a generator and trains a discriminator to distinguish its responses from the teacher LLM's, creating a minimax game. The discriminator acts as an on-policy reward model that co-evolves with the student, providing stable, adaptive feedback. Experimental results show that GAD consistently surpasses the commonly used sequence-level knowledge distillation. In particular, Qwen2.5-14B-Instruct (student) trained with GAD becomes comparable to its teacher, GPT-5-Chat, on the LMSYS-Chat automatic evaluation. The results establish GAD as a promising and effective paradigm for black-box LLM distillation.

Black-Box On-Policy Distillation of Large Language Models

TL;DR

The paper tackles black-box large language model distillation by introducing Generative Adversarial Distillation (GAD), which enables on-policy learning without access to teacher logits or parameters. By framing the student as a generator and a continually adapting discriminator as an on-policy reward model, GAD forms a minimax game that yields stable feedback and better generalization than traditional SeqKD. Empirical results across multiple teacher-student pairs and datasets show GAD matching or approaching teacher performance, with notable gains in out-of-distribution scenarios and robust human evaluations. The work demonstrates that joint, on-policy adversarial training can effectively compress closed-source LLMs while preserving global stylistic and reasoning capabilities, offering a practical approach for black-box distillation in real-world settings.

Abstract

Black-box distillation creates student large language models (LLMs) by learning from a proprietary teacher model's text outputs alone, without access to its internal logits or parameters. In this work, we introduce Generative Adversarial Distillation (GAD), which enables on-policy and black-box distillation. GAD frames the student LLM as a generator and trains a discriminator to distinguish its responses from the teacher LLM's, creating a minimax game. The discriminator acts as an on-policy reward model that co-evolves with the student, providing stable, adaptive feedback. Experimental results show that GAD consistently surpasses the commonly used sequence-level knowledge distillation. In particular, Qwen2.5-14B-Instruct (student) trained with GAD becomes comparable to its teacher, GPT-5-Chat, on the LMSYS-Chat automatic evaluation. The results establish GAD as a promising and effective paradigm for black-box LLM distillation.

Paper Structure

This paper contains 36 sections, 7 equations, 11 figures, 4 tables, 1 algorithm.

Figures (11)

  • Figure 1: Comparison between GAD and sequence-level knowledge distillation (SeqKD; skd) trained on LMSYS-Chat lmsys dataset, evaluated by averaged GPT-4o scores. Left: Results on the LMSYS-Chat test set. Right: Average performance across Dolly dolly, SelfInst self_inst, and Vicuna vicuna datasets.
  • Figure 2: Training procedure of GAD. The student (generator) learns to generate responses that maximize the score assigned by the discriminator. The discriminator is trained with Bradley-Terry loss to assign a lower score to the student than the teacher, learning to distinguish between them. Together, they form a two-player minimax game in an adversarial learning framework.
  • Figure 3: Human evaluation results on the LMSYS-Chat-1M-Clean test set. We compare GAD to the instruct model before distillation and the model fine-tuned with SeqKD.
  • Figure 4: Overlap of local patterns between the student and the teacher. SeqKD tends to overfit to local patterns of the teacher.
  • Figure 5: Black-box distillation on toy data. GAD learns reachable modes from the teacher while SeqKD aims to cover all the modes.
  • ...and 6 more figures