Table of Contents
Fetching ...

Empowering HWNs with Efficient Data Labeling: A Clustered Federated Semi-Supervised Learning Approach

Moqbel Hamood, Abdullatif Albaseer, Mohamed Abdallah, Ala Al-Fuqaha

TL;DR

This work addresses learning in HWNs with abundant unlabeled and non-IID data by integrating Clustered Federated Learning (CFL) with Semi-Supervised Learning (SSL) into Clustered Federated Semi-Supervised Learning (CFSL). CFSL assigns specialized models to clusters and uses them to generate high-quality pseudo-labels for unlabeled data, selecting the best-performing model for each device under latency and bandwidth constraints through a tractable solution. The authors formulate a joint optimization with convergence and resource constraints, implement a clustering-and-labeling pipeline, and validate CFSL on the FEMNIST dataset, showing improved testing and labeling accuracy along with reduced latency compared to fully labeled CFL and HFL+SSL baselines. The approach offers a scalable, edge-aware framework for leveraging unlabeled data in CFL settings, with practical implications for faster convergence and more reliable labeling in wireless edge networks.

Abstract

Clustered Federated Multitask Learning (CFL) has gained considerable attention as an effective strategy for overcoming statistical challenges, particularly when dealing with non independent and identically distributed (non IID) data across multiple users. However, much of the existing research on CFL operates under the unrealistic premise that devices have access to accurate ground truth labels. This assumption becomes especially problematic in hierarchical wireless networks (HWNs), where edge networks contain a large amount of unlabeled data, resulting in slower convergence rates and increased processing times, particularly when dealing with two layers of model aggregation. To address these issues, we introduce a novel framework, Clustered Federated Semi-Supervised Learning (CFSL), designed for more realistic HWN scenarios. Our approach leverages a best-performing specialized model algorithm, wherein each device is assigned a specialized model that is highly adept at generating accurate pseudo-labels for unlabeled data, even when the data stems from diverse environments. We validate the efficacy of CFSL through extensive experiments, comparing it with existing methods highlighted in recent literature. Our numerical results demonstrate that CFSL significantly improves upon key metrics such as testing accuracy, labeling accuracy, and labeling latency under varying proportions of labeled and unlabeled data while also accommodating the non-IID nature of the data and the unique characteristics of wireless edge networks.

Empowering HWNs with Efficient Data Labeling: A Clustered Federated Semi-Supervised Learning Approach

TL;DR

This work addresses learning in HWNs with abundant unlabeled and non-IID data by integrating Clustered Federated Learning (CFL) with Semi-Supervised Learning (SSL) into Clustered Federated Semi-Supervised Learning (CFSL). CFSL assigns specialized models to clusters and uses them to generate high-quality pseudo-labels for unlabeled data, selecting the best-performing model for each device under latency and bandwidth constraints through a tractable solution. The authors formulate a joint optimization with convergence and resource constraints, implement a clustering-and-labeling pipeline, and validate CFSL on the FEMNIST dataset, showing improved testing and labeling accuracy along with reduced latency compared to fully labeled CFL and HFL+SSL baselines. The approach offers a scalable, edge-aware framework for leveraging unlabeled data in CFL settings, with practical implications for faster convergence and more reliable labeling in wireless edge networks.

Abstract

Clustered Federated Multitask Learning (CFL) has gained considerable attention as an effective strategy for overcoming statistical challenges, particularly when dealing with non independent and identically distributed (non IID) data across multiple users. However, much of the existing research on CFL operates under the unrealistic premise that devices have access to accurate ground truth labels. This assumption becomes especially problematic in hierarchical wireless networks (HWNs), where edge networks contain a large amount of unlabeled data, resulting in slower convergence rates and increased processing times, particularly when dealing with two layers of model aggregation. To address these issues, we introduce a novel framework, Clustered Federated Semi-Supervised Learning (CFSL), designed for more realistic HWN scenarios. Our approach leverages a best-performing specialized model algorithm, wherein each device is assigned a specialized model that is highly adept at generating accurate pseudo-labels for unlabeled data, even when the data stems from diverse environments. We validate the efficacy of CFSL through extensive experiments, comparing it with existing methods highlighted in recent literature. Our numerical results demonstrate that CFSL significantly improves upon key metrics such as testing accuracy, labeling accuracy, and labeling latency under varying proportions of labeled and unlabeled data while also accommodating the non-IID nature of the data and the unique characteristics of wireless edge networks.
Paper Structure (13 sections, 7 equations, 4 figures, 1 algorithm)

This paper contains 13 sections, 7 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: The system model
  • Figure 2: Testing accuracy for the proposed and baseline ( fully labeled CFL) when $\phi=0.8$.
  • Figure 3: Labeling accuracy for the proposed and baseline approaches with different percentages of labeled data.
  • Figure 4: Labeling latency for the proposed and baseline approaches with different percentages of labeled data