Table of Contents
Fetching ...

Generative AI-Powered Plugin for Robust Federated Learning in Heterogeneous IoT Networks

Youngjoon Lee, Jinu Gong, Joonhyuk Kang

TL;DR

This work tackles the Non-IID data challenge in privacy-preserving federated learning for heterogeneous IoT environments by introducing a Generative AI-powered plugin. It performs per-device data augmentation to balance class distributions and a central-server balanced sampling to select IID-like devices for aggregation, thereby accelerating convergence and improving robustness in data-scarce settings. Across a medical text classification task, the plugin yields substantial gains in model diversity and reduces the required global training epochs, demonstrating versatility across FL methods and architectures. The results suggest practical significance for deploying FL in healthcare and IoT where data are scarce and highly heterogeneous, with potential extensions to multimodal FL.

Abstract

Federated learning enables edge devices to collaboratively train a global model while maintaining data privacy by keeping data localized. However, the Non-IID nature of data distribution across devices often hinders model convergence and reduces performance. In this paper, we propose a novel plugin for federated optimization methods that approximates Non-IID data distributions to IID through generative AI-enhanced data augmentation and balanced sampling strategy. The key idea is to synthesize additional data for underrepresented classes on each edge device, leveraging generative AI to create a more balanced dataset across the FL network. Additionally, a balanced sampling approach at the central server selectively includes only the most IID-like devices, accelerating convergence while maximizing the global model's performance. Experimental results validate that our approach significantly improves convergence speed and robustness against data imbalance, establishing a flexible, privacy-preserving FL plugin that is applicable even in data-scarce environments.

Generative AI-Powered Plugin for Robust Federated Learning in Heterogeneous IoT Networks

TL;DR

This work tackles the Non-IID data challenge in privacy-preserving federated learning for heterogeneous IoT environments by introducing a Generative AI-powered plugin. It performs per-device data augmentation to balance class distributions and a central-server balanced sampling to select IID-like devices for aggregation, thereby accelerating convergence and improving robustness in data-scarce settings. Across a medical text classification task, the plugin yields substantial gains in model diversity and reduces the required global training epochs, demonstrating versatility across FL methods and architectures. The results suggest practical significance for deploying FL in healthcare and IoT where data are scarce and highly heterogeneous, with potential extensions to multimodal FL.

Abstract

Federated learning enables edge devices to collaboratively train a global model while maintaining data privacy by keeping data localized. However, the Non-IID nature of data distribution across devices often hinders model convergence and reduces performance. In this paper, we propose a novel plugin for federated optimization methods that approximates Non-IID data distributions to IID through generative AI-enhanced data augmentation and balanced sampling strategy. The key idea is to synthesize additional data for underrepresented classes on each edge device, leveraging generative AI to create a more balanced dataset across the FL network. Additionally, a balanced sampling approach at the central server selectively includes only the most IID-like devices, accelerating convergence while maximizing the global model's performance. Experimental results validate that our approach significantly improves convergence speed and robustness against data imbalance, establishing a flexible, privacy-preserving FL plugin that is applicable even in data-scarce environments.

Paper Structure

This paper contains 13 sections, 13 equations, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of FL in a heterogeneous environment with the proposed plugin: This approach approximates Non-IID data distributions to IID, balancing statistical heterogeneity across edge devices to enhance convergence.
  • Figure 2: Illustration of data augmentation at edge device using generative AI to approximate Non-IID data to IID distributions. Original data is supplemented with synthetic data generated by generative AI to balance class distributions.
  • Figure 3: Illustration of the proposed balanced sampling at the central server, selecting set of $K=2$ edge devices with statistically representative data distributions for aggregation to achieve a optimized global model.
  • Figure 4: Performance comparison of text classification models with and without our proposed plugin.
  • Figure 5: Performance comparison of different FL algorithms enhanced with the proposed plugin.