Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts
Yuejiang Liu, Alexandre Alahi
TL;DR
The paper addresses weak-to-strong generalization under large capability gaps by proposing Co-Supervised Learning (CSL), a hierarchical mixture of experts that uses multiple fixed weak teachers to supervise a strong student. It introduces an EM-like framework with teacher assignment and noise reduction, enabling the student to benefit from specialized supervision while rejecting noisy annotations through teacher-student and local-global consistency. Empirical results on OpenAI's weak-to-strong benchmark, ImageNet, and DomainNet show that CSL with multiple specialists and denoising consistently improves performance gap recovery by substantial margins (e.g., over 15% on ImageNet and up to 17% on DomainNet) compared to single-teacher baselines. The work demonstrates a practical pathway to align powerful models using diverse, imperfect supervision and highlights its potential to extend beyond vision tasks in future research.
Abstract
Steering the behavior of a strong model pre-trained on internet-scale data can be difficult due to the scarcity of competent supervisors. Recent studies reveal that, despite supervisory noises, a strong student model may surpass its weak teacher when fine-tuned on specific objectives. Yet, the effectiveness of such weak-to-strong generalization remains limited, especially in the presence of large capability gaps. In this paper, we propose to address this challenge by harnessing a diverse set of specialized teachers, instead of a single generalist one, that collectively supervises the strong student. Our approach resembles the classical hierarchical mixture of experts, with two components tailored for co-supervision: (i) we progressively alternate student training and teacher assignment, leveraging the growth of the strong student to identify plausible supervisions; (ii) we conservatively enforce teacher-student and local-global consistency, leveraging their dependencies to reject potential annotation noises. We validate the proposed method through visual recognition tasks on the OpenAI weak-to-strong benchmark and additional multi-domain datasets. Our code is available at \url{https://github.com/yuejiangliu/csl}.
