Addressing Heterogeneity in Federated Learning: Challenges and Solutions for a Shared Production Environment
Tatjana Legler, Vinit Hegiste, Ahmed Anwar, Martin Ruskowski
TL;DR
This paper analyzes heterogeneity in federated learning within shared production environments, focusing on manufacturing settings. It catalogs heterogeneity types—distributional, quality, and volume variations across clients—and reviews mitigation strategies such as personalized models, robust aggregation, and strategic client participation. It discusses hierarchical or tiered FL to improve robustness and privacy and highlights methods like Centered Kernel Alignment and Shapley-value–based client selection, as well as local data augmentation to counter class imbalance. The work illuminates the practical impact of heterogeneity on convergence, fairness, and efficiency and proposes directions for scalable, adaptive FL solutions in Industrie 4.0 contexts.
Abstract
Federated learning (FL) has emerged as a promising approach to training machine learning models across decentralized data sources while preserving data privacy, particularly in manufacturing and shared production environments. However, the presence of data heterogeneity variations in data distribution, quality, and volume across different or clients and production sites, poses significant challenges to the effectiveness and efficiency of FL. This paper provides a comprehensive overview of heterogeneity in FL within the context of manufacturing, detailing the types and sources of heterogeneity, including non-independent and identically distributed (non-IID) data, unbalanced data, variable data quality, and statistical heterogeneity. We discuss the impact of these types of heterogeneity on model training and review current methodologies for mitigating their adverse effects. These methodologies include personalized and customized models, robust aggregation techniques, and client selection techniques. By synthesizing existing research and proposing new strategies, this paper aims to provide insight for effectively managing data heterogeneity in FL, enhancing model robustness, and ensuring fair and efficient training across diverse environments. Future research directions are also identified, highlighting the need for adaptive and scalable solutions to further improve the FL paradigm in the context of Industry 4.0.
