Simplified TinyBERT: Knowledge Distillation for Document Retrieval
Xuanang Chen, Ben He, Kai Hui, Le Sun, Yingfei Sun
TL;DR
This work addresses the high computational cost of BERT-based document ranking by evaluating standard knowledge distillation and TinyBERT, and introducing Simplified TinyBERT with two simplifications that merge the task-specific distillation into one step and incorporate hard labels. The approach yields a smaller, faster re-ranker that often outperforms BERT-Base, achieving up to 15× speedups on MS MARCO development data. Key contributions include empirical assessment of KD in ranking, and two effective modifications that improve both training efficiency and ranking quality. The results demonstrate the practicality of KD for document retrieval and suggest further exploration with more advanced models.
Abstract
Despite the effectiveness of utilizing the BERT model for document ranking, the high computational cost of such approaches limits their uses. To this end, this paper first empirically investigates the effectiveness of two knowledge distillation models on the document ranking task. In addition, on top of the recently proposed TinyBERT model, two simplifications are proposed. Evaluations on two different and widely-used benchmarks demonstrate that Simplified TinyBERT with the proposed simplifications not only boosts TinyBERT, but also significantly outperforms BERT-Base when providing 15$\times$ speedup.
