CATCHFed: Efficient Unlabeled Data Utilization for Semi-Supervised Federated Learning in Limited Labels Environments
Byoungjun Park, Pedro Porto Buarque de Gusmão, Dongjin Ji, Minhoe Kim
TL;DR
CATCHFed tackles the challenge of extremely label-scarce semi-supervised federated learning by introducing three mechanisms: client-aware adaptive warm-up thresholds (CAWT) that adjust per-class thresholds for each client, a hybrid energy-based thresholding scheme to improve pseudo-label quality, and consistency regularization that leverages unpseudo-labeled data. The approach maximizes unlabeled data usage by enabling pseudo-labeling only when both confidence and distribution-alignment criteria are met, while still benefiting from discarded samples through consistency losses. Empirical results across CIFAR-10/100 and SVHN under IID and Non-IID settings show CATCHFed consistently outperforms strong baselines, often by notable margins, especially when server labels are extremely limited. The work also provides insights into energy-threshold tuning and calibration, highlighting practical implications for deploying SSFL in real-world, privacy-preserving settings.
Abstract
Federated learning is a promising paradigm that utilizes distributed client resources while preserving data privacy. Most existing FL approaches assume clients possess labeled data, however, in real-world scenarios, client-side labels are often unavailable. Semi-supervised Federated learning, where only the server holds labeled data, addresses this issue. However, it experiences significant performance degradation as the number of labeled data decreases. To tackle this problem, we propose \textit{CATCHFed}, which introduces client-aware adaptive thresholds considering class difficulty, hybrid thresholds to enhance pseudo-label quality, and utilizes unpseudo-labeled data for consistency regularization. Extensive experiments across various datasets and configurations demonstrate that CATCHFed effectively leverages unlabeled client data, achieving superior performance even in extremely limited-label settings.
