Table of Contents
Fetching ...

Bridging Local and Federated Data Normalization in Federated Learning: A Privacy-Preserving Approach

Melih Coşğun, Mert Gençtürk, Sinem Sav

TL;DR

This work tackles the challenge of data normalization in federated learning under non-IID data by introducing Federated Normalization (FedNorm), which simulates pooled normalization through privacy-preserving parameter exchange. The authors extend this idea with Privacy-Preserving Federated (PPF) normalization protocols for Z-score, MinMax, and Robust scaling, all implemented via multiparty fully homomorphic encryption (MHE) on the CKKS scheme. A key technical contribution is a novel encrypted $k$-th ranked element calculation that enables robust scaling entirely in the encrypted domain, reducing information leakage. Empirically, federated normalization consistently outperforms local normalization in non-IID settings, and the PPF protocols achieve practical runtimes with controllable precision loss, enabling secure, scalable PPFL preprocessing. Overall, the paper provides a comprehensive framework for privacy-preserving data normalization in FL, with broad applicability to both regression and classification tasks and potential extensions beyond normalization primitives.

Abstract

Data normalization is a crucial preprocessing step for enhancing model performance and training stability. In federated learning (FL), where data remains distributed across multiple parties during collaborative model training, normalization presents unique challenges due to the decentralized and often heterogeneous nature of the data. Traditional methods rely on either independent client-side processing, i.e., local normalization, or normalizing the entire dataset before distributing it to parties, i.e., pooled normalization. Local normalization can be problematic when data distributions across parties are non-IID, while the pooled normalization approach conflicts with the decentralized nature of FL. In this paper, we explore the adaptation of widely used normalization techniques to FL and define the term federated normalization. Federated normalization simulates pooled normalization by enabling the collaborative exchange of normalization parameters among parties. Thus, it achieves performance on par with pooled normalization without compromising data locality. However, sharing normalization parameters such as the mean introduces potential privacy risks, which we further mitigate through a robust privacy-preserving solution. Our contributions include: (i) We systematically evaluate the impact of various federated and local normalization techniques in heterogeneous FL scenarios, (ii) We propose a novel homomorphically encrypted $k$-th ranked element (and median) calculation tailored for the federated setting, enabling secure and efficient federated normalization, (iii) We propose privacy-preserving implementations of widely used normalization techniques for FL, leveraging multiparty fully homomorphic encryption (MHE).

Bridging Local and Federated Data Normalization in Federated Learning: A Privacy-Preserving Approach

TL;DR

This work tackles the challenge of data normalization in federated learning under non-IID data by introducing Federated Normalization (FedNorm), which simulates pooled normalization through privacy-preserving parameter exchange. The authors extend this idea with Privacy-Preserving Federated (PPF) normalization protocols for Z-score, MinMax, and Robust scaling, all implemented via multiparty fully homomorphic encryption (MHE) on the CKKS scheme. A key technical contribution is a novel encrypted -th ranked element calculation that enables robust scaling entirely in the encrypted domain, reducing information leakage. Empirically, federated normalization consistently outperforms local normalization in non-IID settings, and the PPF protocols achieve practical runtimes with controllable precision loss, enabling secure, scalable PPFL preprocessing. Overall, the paper provides a comprehensive framework for privacy-preserving data normalization in FL, with broad applicability to both regression and classification tasks and potential extensions beyond normalization primitives.

Abstract

Data normalization is a crucial preprocessing step for enhancing model performance and training stability. In federated learning (FL), where data remains distributed across multiple parties during collaborative model training, normalization presents unique challenges due to the decentralized and often heterogeneous nature of the data. Traditional methods rely on either independent client-side processing, i.e., local normalization, or normalizing the entire dataset before distributing it to parties, i.e., pooled normalization. Local normalization can be problematic when data distributions across parties are non-IID, while the pooled normalization approach conflicts with the decentralized nature of FL. In this paper, we explore the adaptation of widely used normalization techniques to FL and define the term federated normalization. Federated normalization simulates pooled normalization by enabling the collaborative exchange of normalization parameters among parties. Thus, it achieves performance on par with pooled normalization without compromising data locality. However, sharing normalization parameters such as the mean introduces potential privacy risks, which we further mitigate through a robust privacy-preserving solution. Our contributions include: (i) We systematically evaluate the impact of various federated and local normalization techniques in heterogeneous FL scenarios, (ii) We propose a novel homomorphically encrypted -th ranked element (and median) calculation tailored for the federated setting, enabling secure and efficient federated normalization, (iii) We propose privacy-preserving implementations of widely used normalization techniques for FL, leveraging multiparty fully homomorphic encryption (MHE).

Paper Structure

This paper contains 26 sections, 2 equations, 13 figures, 11 tables, 4 algorithms.

Figures (13)

  • Figure 1: Overview of data normalization approaches in federated learning. The top (red) section shows traditional methods: (a) Pooled Normalization, where raw datasets are centrally normalized, and (b) Local Normalization, where each client normalizes data independently. The bottom (blue) section presents our federated methods: (c) Federated Normalization, which aggregates normalization parameters without sharing raw data, and (d) Privacy-Preserving Federated (PPF) Normalization, which enhances (c) using homomorphic encryption to securely aggregate parameters and minimize information leakage.
  • Figure 2: Test F1 scores averaged across datasets, client numbers, and imbalance types for various normalization techniques. YJ transformations were applied only to tabular datasets. Only classification datasets are considered, as F1 scores represent classification performance. Labels prefixed with 'Fed' denote the federated versions of normalization methods, while 'No Norm' represents the baseline without normalization.
  • Figure 3: Test F1 scores of various imbalance settings for 30 clients (50 for image datasets). Highly imbalanced (more heterogeneous) parameters are selected for each imbalance type. Only classification datasets (BCW, CIFAR-10, Hepatitis, MNIST) are included due to representation with F1 score metric. The x-axis shows the dataset names.
  • Figure 4: Test F1 results of MNIST dataset averaged over client numbers to observe the impact of different heterogeneous scenarios. We observe that under the feature-imbalanced distribution, federated normalization outperforms local.
  • Figure 5: Test $R^2$ scores of Parkinson's Monitoring dataset experiments under feature imbalance, averaged over normalization techniques used. Results are sorted according to $R^2$ scores.
  • ...and 8 more figures