Table of Contents
Fetching ...

Efficient Data Labeling and Optimal Device Scheduling in HWNs Using Clustered Federated Semi-Supervised Learning

Moqbel Hamood, Abdullatif Albaseer, Mohamed Abdallah, Ala Al-Fuqaha

TL;DR

CFSL tackles labeling scarcity in HWNs by fusing clustered federated learning with semi-supervised labeling. It introduces two prediction-model schemes, two prediction-time schemes, and two worker-selection strategies to label unlabeled data efficiently. The authors provide a convergence analysis under standard assumptions and a comprehensive performance evaluation using FEMNIST and CIFAR-10, showing significant gains in accuracy and substantial energy savings. The framework's practical impact lies in enabling scalable, energy-efficient learning in heterogeneous HWNs with limited labeled data.

Abstract

Clustered Federated Multi-task Learning (CFL) has emerged as a promising technique to address statistical challenges, particularly with non-independent and identically distributed (non-IID) data across users. However, existing CFL studies entirely rely on the impractical assumption that devices possess access to accurate ground-truth labels. This assumption becomes problematic in hierarchical wireless networks (HWNs), with vast unlabeled data and dual-level model aggregation, slowing convergence speeds, extending processing times, and increasing resource consumption. To this end, we propose Clustered Federated Semi-Supervised Learning (CFSL), a novel framework tailored for realistic scenarios in HWNs. We leverage specialized models from device clustering and present two prediction model schemes: the best-performing specialized model and the weighted-averaging ensemble model. The former assigns the most suitable specialized model to label unlabeled data, while the latter unifies specialized models to capture broader data distributions. CFSL introduces two novel prediction time schemes, split-based and stopping-based, for accurate labeling timing, and two device selection strategies, greedy and round-robin. Extensive testing validates CFSL's superiority in labeling/testing accuracy and resource efficiency, achieving up to 51% energy savings.

Efficient Data Labeling and Optimal Device Scheduling in HWNs Using Clustered Federated Semi-Supervised Learning

TL;DR

CFSL tackles labeling scarcity in HWNs by fusing clustered federated learning with semi-supervised labeling. It introduces two prediction-model schemes, two prediction-time schemes, and two worker-selection strategies to label unlabeled data efficiently. The authors provide a convergence analysis under standard assumptions and a comprehensive performance evaluation using FEMNIST and CIFAR-10, showing significant gains in accuracy and substantial energy savings. The framework's practical impact lies in enabling scalable, energy-efficient learning in heterogeneous HWNs with limited labeled data.

Abstract

Clustered Federated Multi-task Learning (CFL) has emerged as a promising technique to address statistical challenges, particularly with non-independent and identically distributed (non-IID) data across users. However, existing CFL studies entirely rely on the impractical assumption that devices possess access to accurate ground-truth labels. This assumption becomes problematic in hierarchical wireless networks (HWNs), with vast unlabeled data and dual-level model aggregation, slowing convergence speeds, extending processing times, and increasing resource consumption. To this end, we propose Clustered Federated Semi-Supervised Learning (CFSL), a novel framework tailored for realistic scenarios in HWNs. We leverage specialized models from device clustering and present two prediction model schemes: the best-performing specialized model and the weighted-averaging ensemble model. The former assigns the most suitable specialized model to label unlabeled data, while the latter unifies specialized models to capture broader data distributions. CFSL introduces two novel prediction time schemes, split-based and stopping-based, for accurate labeling timing, and two device selection strategies, greedy and round-robin. Extensive testing validates CFSL's superiority in labeling/testing accuracy and resource efficiency, achieving up to 51% energy savings.

Paper Structure

This paper contains 37 sections, 42 equations, 11 figures, 2 tables, 3 algorithms.

Figures (11)

  • Figure 1: The system model.
  • Figure 2: System framework diagram for the CFSL approach.
  • Figure 3: Illustrating labeling process for the CFSL in HWNs.
  • Figure 4: Testing accuracy for the proposed approach compared to labeled CFL (without SSL) at $\Phi=0.70$.
  • Figure 5: Testing accuracy for the proposed approach compared to CFL with SSL using random selection at $\Phi=0.70$.
  • ...and 6 more figures