GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation

Wenjie Zhou; Zhenxin Ding; Xiaodong Zhang; Haibo Shi; Junfeng Wang; Dawei Yin

GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation

Wenjie Zhou, Zhenxin Ding, Xiaodong Zhang, Haibo Shi, Junfeng Wang, Dawei Yin

TL;DR

A novel algorithm, GOVERN, is proposed to tackle the issue of how can knowledge from multiple teacher models be effectively ensemble during this stage without the guidance of labels, and has been successfully deployed in a real-world commercial question-answering system.

Abstract

Pre-trained language models have become an integral component of question-answering systems, achieving remarkable performance. However, for practical deployment, it is crucial to perform knowledge distillation to maintain high performance while operating under computational constraints. In this paper, we address a key question: given the importance of unsupervised distillation for student model performance, how can knowledge from multiple teacher models be effectively ensemble during this stage without the guidance of labels? We propose a novel algorithm, GOVERN, to tackle this issue. GOVERN has demonstrated significant improvements in both offline and online experiments, enabling the student model to achieve results comparable to that of teacher ensembles. Our experiments show that GOVERN remarkably requires a mere 1\% of the ensemble method's inference budget to achieve 99.5\% of performance. The proposed algorithm has been successfully deployed in a real-world commercial question-answering system, demonstrating its real-world applicability.

GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation

TL;DR

Abstract

Paper Structure (18 sections, 12 equations, 5 figures, 4 tables)

This paper contains 18 sections, 12 equations, 5 figures, 4 tables.

Introduction
Answer Selection Task
Methodology
Unsupervised Distillation
GOVERN
Supervised Distillation: GOVERN-CA
Experiments and Results
Dataset
Experiment Details
Evaluation Metrics
Main Results
Online Experiment
Ablation Study
Related Work
Conclusion
...and 3 more sections

Figures (5)

Figure 1: Procedures of Gradient Orientation Vote Ensemble Reinforced Distillation
Figure 2: The Answer Card is retrieve by the question answering system. Web pages below are not display in answer card format.
Figure 3: The effect of the number of teachers.
Figure 4: Left part shows the distribution of our model's output on test set, and right part shows the distribution of $Beta(19.0, 3.0)$. We can see that the model's output keep similar distribution with Beta function.
Figure 5:

GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation

TL;DR

Abstract

GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)