Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition
Chuanguang Yang, Xinqiang Yu, Han Yang, Zhulin An, Chengqing Yu, Libo Huang, Yongjun Xu
TL;DR
This work tackles the challenge of balancing knowledge transfer from a pool of teachers to a student in visual recognition by formulating multi-teacher KD as an RL problem. An agent observes a state that encodes both teacher performance and teacher–student gaps and outputs per-sample weights $w_i^m$ to weight each teacher's contribution in the KD loss; the agent is updated via policy gradient using rewards derived from the student's performance. The approach achieves state-of-the-art results across image classification, object detection, and semantic segmentation on standard benchmarks, with ablation studies highlighting the benefits of jointly considering teacher performance and teacher–student gaps. Overall, MTKD-RL demonstrates that data-driven, sample-wise weighting guided by RL can surpass entropy-based or meta-learning strategies in multi-teacher KD, with practical impact on dense prediction tasks and scalable to large datasets.
Abstract
Multi-teacher Knowledge Distillation (KD) transfers diverse knowledge from a teacher pool to a student network. The core problem of multi-teacher KD is how to balance distillation strengths among various teachers. Most existing methods often develop weighting strategies from an individual perspective of teacher performance or teacher-student gaps, lacking comprehensive information for guidance. This paper proposes Multi-Teacher Knowledge Distillation with Reinforcement Learning (MTKD-RL) to optimize multi-teacher weights. In this framework, we construct both teacher performance and teacher-student gaps as state information to an agent. The agent outputs the teacher weight and can be updated by the return reward from the student. MTKD-RL reinforces the interaction between the student and teacher using an agent in an RL-based decision mechanism, achieving better matching capability with more meaningful weights. Experimental results on visual recognition tasks, including image classification, object detection, and semantic segmentation tasks, demonstrate that MTKD-RL achieves state-of-the-art performance compared to the existing multi-teacher KD works.
