Co-Learning: Towards Semi-Supervised Object Detection with Road-side Cameras
Jicheng Yuan, Anh Le-Tuan, Ali Ganbarov, Manfred Hauswirth, Danh Le-Phuoc
TL;DR
The paper tackles the challenge of semi-supervised object detection for road-side camera data under data-scarce conditions. It introduces Co-Learning, a teacher–student framework that uses data curation, annotation-alignment, dynamic pseudo-labels, and multi-head student configurations to mitigate pseudo-label noise and task misalignment. On the AI City Challenge Track 2 dataset, starting from only $10\%$ labeled data, the approach achieves robust $mAP$ gains, reaching $36.5$ with unlabeled data and annotation alignment, surpassing an oracle model relying solely on pseudo labels by $0.4\%$. The work demonstrates that label-consistent SSOD can significantly reduce annotation costs while maintaining strong detection performance, with clear paths to edge-device deployment and integration with visual-language cues for enhanced practicality.
Abstract
Recently, deep learning has experienced rapid expansion, contributing significantly to the progress of supervised learning methodologies. However, acquiring labeled data in real-world settings can be costly, labor-intensive, and sometimes scarce. This challenge inhibits the extensive use of neural networks for practical tasks due to the impractical nature of labeling vast datasets for every individual application. To tackle this, semi-supervised learning (SSL) offers a promising solution by using both labeled and unlabeled data to train object detectors, potentially enhancing detection efficacy and reducing annotation costs. Nevertheless, SSL faces several challenges, including pseudo-target inconsistencies, disharmony between classification and regression tasks, and efficient use of abundant unlabeled data, especially on edge devices, such as roadside cameras. Thus, we developed a teacher-student-based SSL framework, Co-Learning, which employs mutual learning and annotation-alignment strategies to adeptly navigate these complexities and achieves comparable performance as fully-supervised solutions using 10\% labeled data.
