Cycle Training with Semi-Supervised Domain Adaptation: Bridging Accuracy and Efficiency for Real-Time Mobile Scene Detection
Huu-Phong Phan-Nguyen, Anh Dao, Tien-Huy Nguyen, Tuan Quang, Huu-Loc Tran, Tinh-Anh Nguyen-Nhu, Huy-Thach Pham, Quan Nguyen, Hoang M. Le, Quang-Vinh Dinh
TL;DR
This work tackles real-time mobile scene detection by blending Semi-Supervised Domain Adaptation (SSDA) with Cycle Training (CT) to transfer knowledge from a large teacher to a compact student. A ResNet-101x3 BiT-style teacher is first fine-tuned on labeled data and used to generate high-confidence pseudo labels for unlabeled data, expanding the training set with a confidence threshold $\tau$ via $s(x)=\max_c f(x)_c$ and $y_p=\arg\max_c f(x)_c$. The augmented data then trains a lightweight MobileNetV2 student through a three-stage CT schedule—Exploitation, Exploration, and Stabilization—to preserve representations while maximizing task adaptation. On CamSSD, the approach delivers a Top-1 accuracy of $94.00\%$ and Top-3 of $99.17\%$, with an on-device CPU inference time of $1.61$ ms, demonstrating a practical balance between accuracy and efficiency for on-edge deployment.
Abstract
Nowadays, smartphones are ubiquitous, and almost everyone owns one. At the same time, the rapid development of AI has spurred extensive research on applying deep learning techniques to image classification. However, due to the limited resources available on mobile devices, significant challenges remain in balancing accuracy with computational efficiency. In this paper, we propose a novel training framework called Cycle Training, which adopts a three-stage training process that alternates between exploration and stabilization phases to optimize model performance. Additionally, we incorporate Semi-Supervised Domain Adaptation (SSDA) to leverage the power of large models and unlabeled data, thereby effectively expanding the training dataset. Comprehensive experiments on the CamSSD dataset for mobile scene detection demonstrate that our framework not only significantly improves classification accuracy but also ensures real-time inference efficiency. Specifically, our method achieves a 94.00% in Top-1 accuracy and a 99.17% in Top-3 accuracy and runs inference in just 1.61ms using CPU, demonstrating its suitability for real-world mobile deployment.
