Table of Contents
Fetching ...

Federated scientific machine learning for approximating functions and solving differential equations with data heterogeneity

Handi Zhang, Langchen Liu, Lu Lu

TL;DR

Proposed federated methods surpass the models trained only using local data and achieve competitive accuracy of centralized models trained using all data and develop a theoretical framework to establish growth bounds for weight divergence in federated learning compared to traditional centralized learning.

Abstract

By leveraging neural networks, the emerging field of scientific machine learning (SciML) offers novel approaches to address complex problems governed by partial differential equations (PDEs). In practical applications, challenges arise due to the distributed essence of data, concerns about data privacy, or the impracticality of transferring large volumes of data. Federated learning (FL), a decentralized framework that enables the collaborative training of a global model while preserving data privacy, offers a solution to the challenges posed by isolated data pools and sensitive data issues. Here, this paper explores the integration of FL and SciML to approximate complex functions and solve differential equations. We propose two novel models: federated physics-informed neural networks (FedPINN) and federated deep operator networks (FedDeepONet). We further introduce various data generation methods to control the degree of non-independent and identically distributed (non-iid) data and utilize the 1-Wasserstein distance to quantify data heterogeneity in function approximation and PDE learning. We systematically investigate the relationship between data heterogeneity and federated model performance. Additionally, we propose a measure of weight divergence and develop a theoretical framework to establish growth bounds for weight divergence in federated learning compared to traditional centralized learning. To demonstrate the effectiveness of our methods, we conducted 10 experiments, including 2 on function approximation, 5 PDE problems on FedPINN, and 3 PDE problems on FedDeepONet. These experiments demonstrate that proposed federated methods surpass the models trained only using local data and achieve competitive accuracy of centralized models trained using all data.

Federated scientific machine learning for approximating functions and solving differential equations with data heterogeneity

TL;DR

Proposed federated methods surpass the models trained only using local data and achieve competitive accuracy of centralized models trained using all data and develop a theoretical framework to establish growth bounds for weight divergence in federated learning compared to traditional centralized learning.

Abstract

By leveraging neural networks, the emerging field of scientific machine learning (SciML) offers novel approaches to address complex problems governed by partial differential equations (PDEs). In practical applications, challenges arise due to the distributed essence of data, concerns about data privacy, or the impracticality of transferring large volumes of data. Federated learning (FL), a decentralized framework that enables the collaborative training of a global model while preserving data privacy, offers a solution to the challenges posed by isolated data pools and sensitive data issues. Here, this paper explores the integration of FL and SciML to approximate complex functions and solve differential equations. We propose two novel models: federated physics-informed neural networks (FedPINN) and federated deep operator networks (FedDeepONet). We further introduce various data generation methods to control the degree of non-independent and identically distributed (non-iid) data and utilize the 1-Wasserstein distance to quantify data heterogeneity in function approximation and PDE learning. We systematically investigate the relationship between data heterogeneity and federated model performance. Additionally, we propose a measure of weight divergence and develop a theoretical framework to establish growth bounds for weight divergence in federated learning compared to traditional centralized learning. To demonstrate the effectiveness of our methods, we conducted 10 experiments, including 2 on function approximation, 5 PDE problems on FedPINN, and 3 PDE problems on FedDeepONet. These experiments demonstrate that proposed federated methods surpass the models trained only using local data and achieve competitive accuracy of centralized models trained using all data.

Paper Structure

This paper contains 34 sections, 1 theorem, 48 equations, 15 figures, 4 tables, 1 algorithm.

Key Result

Theorem 3.1

Consider a federated learning (FL) model with one global aggregation after $E$ local epochs and a learning rate $\eta$. Let $\mathcal{E}_{\text{WD}}^{1, E}$ denote the weight divergence of the FL model compared to a centralized model with the same initialization, learning rate, and trained for $E$ e The conclusion can also be extended to the $l$-th global epoch. The weight divergence $\mathcal{E}_

Figures (15)

  • Figure 1: Workflow of federated scientific machine learning for approximating operators and solving differential equations. (Top) In FedSciML, the training data is distributed across $K$ clients. (Bottom) Each client has its model and dataset. The models are trained through a collaborative training procedure, which includes (1) the aggregation from local models to the server model and (2) the broadcast from the server model back to local models.
  • Figure 2: Visualization of data generation methods. (A) Data generation for 1D data with two clients in different data heterogeneity. From left to right, the level of iid increases. The last column is the change of $W_1$ distance for different numbers of subdomains. (B) 2D data with $x$-partition and two clients. (C) 2D data with $xy$-partition and two clients. (D) 2D data with $xy$-partition and three clients. The last column shows the mean pairwise $W_1$ distance change for three, four, and five clients. (E) Visualization of a data generation method for operator learning. From the first column to the third column, functions are sampled from different functional spaces.
  • Figure 3: Visualization of weight divergence in federated learning. The black dashed line represents the gradient descent for the centralized model. The blue and orange lines correspond to the gradient descents for two clients in federated learning, while the green line depicts the gradient descent of the global model using the FedAvg algorithm.
  • Figure 4: Approximating the Gramacy & Lee function in Section \ref{['subsec:gramacy']}. (A) $L_2$ relative error for different number of subdomains and $W_1$ distances. (B) Weight divergence of hidden layers for different numbers of subdomains and $W_1$ distances. (C) Examples include (left) 2 subdomains with high data heterogeneity and (right) 100 subdomains with low data heterogeneity. The shaded areas represent the subdomain partitions for two clients.
  • Figure 5: Approximating the Schaffer function with two clients in Section \ref{['subsec:schaffer']}. (A) $L_2$ relative error for different number of subdomains and $W_1$ distances. (B) Weight divergence of hidden layers for different numbers of subdomains and $W_1$ distances. (C) The ground truth of the Schaffer function. (D and E) Comparison of federated model and two extrapolation baselines under (D) most non-iid and (E) most iid scenarios. The black dashed line visualizes the $x$-partition 2D data generation for two clients.
  • ...and 10 more figures

Theorems & Definitions (2)

  • Theorem 3.1
  • proof