Collaboration of Teachers for Semi-supervised Object Detection
Liyu Chen, Huaao Tang, Yi Wen, Hanting Chen, Wei Li, Junchao Liu, Jie Hu
TL;DR
This work tackles teacher–student coupling and confirmation bias in semi-supervised object detection by introducing the Collaboration of Teachers Framework (CTF) with multiple decoupled teacher–student pairs and a Data Performance Consistency Optimization (DPCO) module. The burn-in phase yields diverse teacher perspectives while the two-stage training uses DPCO to identify the best teacher based on accumulative labeled loss, guiding other pairs via reliable pseudo-labels with L_total = L_l + \lambda_u L_u + \beta L_{DPC} and $W_t = (1-\alpha)W_t + \alpha W_s$, where $\beta=2$. Empirically, CT F with DPCO significantly improves mAP on COCO-PARTIAL and VOC-PARTIAL benchmarks (e.g., up to +0.89 mAP over baselines) and converges faster than prior SSOD methods, while remaining plug-and-play with existing approaches. The approach reduces confirmation bias, enhances unlabeled-data utilization, and offers a scalable, generalizable framework for advancing SSOD beyond EMA-based single-teacher paradigms.
Abstract
Recent semi-supervised object detection (SSOD) has achieved remarkable progress by leveraging unlabeled data for training. Mainstream SSOD methods rely on Consistency Regularization methods and Exponential Moving Average (EMA), which form a cyclic data flow. However, the EMA updating training approach leads to weight coupling between the teacher and student models. This coupling in a cyclic data flow results in a decrease in the utilization of unlabeled data information and the confirmation bias on low-quality or erroneous pseudo-labels. To address these issues, we propose the Collaboration of Teachers Framework (CTF), which consists of multiple pairs of teacher and student models for training. In the learning process of CTF, the Data Performance Consistency Optimization module (DPCO) informs the best pair of teacher models possessing the optimal pseudo-labels during the past training process, and these most reliable pseudo-labels generated by the best performing teacher would guide the other student models. As a consequence, this framework greatly improves the utilization of unlabeled data and prevents the positive feedback cycle of unreliable pseudo-labels. The CTF achieves outstanding results on numerous SSOD datasets, including a 0.71% mAP improvement on the 10% annotated COCO dataset and a 0.89% mAP improvement on the VOC dataset compared to LabelMatch and converges significantly faster. Moreover, the CTF is plug-and-play and can be integrated with other mainstream SSOD methods.
