Table of Contents
Fetching ...

Decentralized Personalized Federated Learning for Min-Max Problems

Ekaterina Borodich, Aleksandr Beznosikov, Abdurakhmon Sadiev, Vadim Sushko, Nikolay Savelyev, Martin Takáč, Alexander Gasnikov

TL;DR

The paper tackles decentralized personalized federated learning for saddle point problems by introducing a mixing objective that couples local models through a gossip matrix $W$ and a personalization parameter $\lambda$. It develops three algorithmic strategies—Sliding Opt Comm (for small $\lambda$), Sliding Big Lambda (for large $\lambda$), and a Local Variance-Reduction method—to achieve near-optimal communication and local computation complexities in both deterministic and stochastic settings. It also establishes tight lower bounds for both communications and local oracle calls, showing the proposed methods are optimal in key regimes. Theoretical results are complemented by experiments on bilinear problems and neural networks with adversarial noise, demonstrating favorable communication efficiency and robustness, and highlighting practical applicability to decentralized federated learning with personalization.

Abstract

Personalized Federated Learning (PFL) has witnessed remarkable advancements, enabling the development of innovative machine learning applications that preserve the privacy of training data. However, existing theoretical research in this field has primarily focused on distributed optimization for minimization problems. This paper is the first to study PFL for saddle point problems encompassing a broader range of optimization problems, that require more than just solving minimization problems. In this work, we consider a recently proposed PFL setting with the mixing objective function, an approach combining the learning of a global model together with locally distributed learners. Unlike most previous work, which considered only the centralized setting, we work in a more general and decentralized setup that allows us to design and analyze more practical and federated ways to connect devices to the network. We proposed new algorithms to address this problem and provide a theoretical analysis of the smooth (strongly) convex-(strongly) concave saddle point problems in stochastic and deterministic cases. Numerical experiments for bilinear problems and neural networks with adversarial noise demonstrate the effectiveness of the proposed methods.

Decentralized Personalized Federated Learning for Min-Max Problems

TL;DR

The paper tackles decentralized personalized federated learning for saddle point problems by introducing a mixing objective that couples local models through a gossip matrix and a personalization parameter . It develops three algorithmic strategies—Sliding Opt Comm (for small ), Sliding Big Lambda (for large ), and a Local Variance-Reduction method—to achieve near-optimal communication and local computation complexities in both deterministic and stochastic settings. It also establishes tight lower bounds for both communications and local oracle calls, showing the proposed methods are optimal in key regimes. Theoretical results are complemented by experiments on bilinear problems and neural networks with adversarial noise, demonstrating favorable communication efficiency and robustness, and highlighting practical applicability to decentralized federated learning with personalization.

Abstract

Personalized Federated Learning (PFL) has witnessed remarkable advancements, enabling the development of innovative machine learning applications that preserve the privacy of training data. However, existing theoretical research in this field has primarily focused on distributed optimization for minimization problems. This paper is the first to study PFL for saddle point problems encompassing a broader range of optimization problems, that require more than just solving minimization problems. In this work, we consider a recently proposed PFL setting with the mixing objective function, an approach combining the learning of a global model together with locally distributed learners. Unlike most previous work, which considered only the centralized setting, we work in a more general and decentralized setup that allows us to design and analyze more practical and federated ways to connect devices to the network. We proposed new algorithms to address this problem and provide a theoretical analysis of the smooth (strongly) convex-(strongly) concave saddle point problems in stochastic and deterministic cases. Numerical experiments for bilinear problems and neural networks with adversarial noise demonstrate the effectiveness of the proposed methods.

Paper Structure

This paper contains 26 sections, 17 theorems, 121 equations, 5 figures, 3 tables, 6 algorithms.

Key Result

Theorem 1

For any positive parameters $\chi \geq 3$, $\mu > 0$, $L \geq 2\mu$, $\lambda > 0$, $\lambda^+_{\min} > 0$ ($\lambda \lambda_{\min}^+ \geq \mu$), $\lambda_{\max} \geq \lambda^+_{\min}$ and any integer $k > 0$ there exists a problem of the form PF satisfying Assumptions ass:smooth - ass:sc on graph $

Figures (5)

  • Figure 1: Comparison of Algorithm \ref{['alg:sliding_opt_comm']} with different $T$ on different networks for \ref{['PF']}+\ref{['bilinear']} with $\lambda = 0,1$.
  • Figure 2: Comparison of Algorithm \ref{['alg:sliding_big_lambda']} with different $T$ on different networks for \ref{['PF']}+\ref{['bilinear']} with $\lambda = 20$.
  • Figure 3: Comparison of \ref{['alg_sum']} with different $p = \rho$ on different networks for \ref{['PF']}+\ref{['bilinear']} with $\lambda = 0,01$. Top: in terms of all iterations, bottom: in terms of communications.
  • Figure 4: Comparison of \ref{['alg_sum']} with different $p = \rho$ on different networks for \ref{['PF']}+\ref{['bilinear']} with $\lambda = 1$. Top: in terms of all iterations, bottom: in terms of communications.
  • Figure 5: Average accuracy in during process of learning with different average parameters $p$ and $T$. The first line presents the results of \ref{['alg:sliding_opt_comm']}, the second - \ref{['alg_sum']}. Red line -- accuracy of the local model on local train data, blue line - accuracy of the local model on local test data, black line -- accuracy of the global model on global test data. The experiment was repeated 5 times, the deviations are reflected.

Theorems & Definitions (29)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Lemma 1: Lemma 1 from sadiev2022decentralized
  • proof
  • Lemma 2: Lemma 2 from sadiev2022decentralized
  • proof
  • ...and 19 more