FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning

Gihun Lee; Minchan Jeong; Sangmook Kim; Jaehoon Oh; Se-Young Yun

FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning

Gihun Lee, Minchan Jeong, Sangmook Kim, Jaehoon Oh, Se-Young Yun

TL;DR

FedSOL addresses non-IID data in Federated Learning by introducing an orthogonal learning principle that hides behind proximal restrictions. It uses a SAM-inspired adversarial perturbation of weights to compute local gradients that are effectively orthogonal to proximal gradients, thereby preserving global knowledge while enabling local learning. Theoretical analysis links FedSOL updates to proximal orthogonality and local objective equivalence, supported by extensive experiments across multiple datasets, heterogeneity levels, and model architectures, showing state-of-the-art performance and smoother loss landscapes. The method also demonstrates robustness to different proximal objectives and offers efficient partial perturbation strategies, making FedSOL a practical approach for scalable, privacy-preserving FL.

Abstract

Federated Learning (FL) aggregates locally trained models from individual clients to construct a global model. While FL enables learning a model with data privacy, it often suffers from significant performance degradation when clients have heterogeneous data distributions. This data heterogeneity causes the model to forget the global knowledge acquired from previously sampled clients after being trained on local datasets. Although the introduction of proximal objectives in local updates helps to preserve global knowledge, it can also hinder local learning by interfering with local objectives. To address this problem, we propose a novel method, Federated Stabilized Orthogonal Learning (FedSOL), which adopts an orthogonal learning strategy to balance the two conflicting objectives. FedSOL is designed to identify gradients of local objectives that are inherently orthogonal to directions affecting the proximal objective. Specifically, FedSOL targets parameter regions where learning on the local objective is minimally influenced by proximal weight perturbations. Our experiments demonstrate that FedSOL consistently achieves state-of-the-art performance across various scenarios.

FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning

TL;DR

Abstract

Paper Structure (57 sections, 3 theorems, 32 equations, 13 figures, 13 tables, 1 algorithm)

This paper contains 57 sections, 3 theorems, 32 equations, 13 figures, 13 tables, 1 algorithm.

Introduction
Proximal Restriction in Local Learning
Proximal Restriction in FL
Forgetting in Local Learning
Proximal Gradient Projection
Proposed Method: FedSOL
Preliminary: Overview of SAM
Adversarial Proximal Perturbation
Step1: Weight Perturbation
Step2: Parameter Update
Adaptive Perturbation Strength
Partial Perturbation
Theoretical Analysis
Experiment
Experimental Setups
...and 42 more sections

Key Result

Proposition 1

(Proximal Objective Orthogonality). Given a local loss $\mathcal{L}^k_{\mathrm{local}}$ and its Hessian matrix $\nabla^2\mathcal{L}^k_{\mathrm{local}} \succcurlyeq 0$ evaluated at $\boldsymbol{w}_k$, the change of proximal loss by FedSOL update reduces the conflicts $\langle\boldsymbol{g}_l$ ,$\bol

Figures (13)

Figure 1: An overview of the FedSOL update. At each update, FedSOL computes its update gradient at the proximally perturbed weights. By withstanding the proximal perturbation, FedSOL obtains a local gradient that is orthogonal to the proximal gradient.
Figure 2: CIFAR-10 partition examples across 10 clients.
Figure 3: Effect of FedSOL on local learning in CIFAR-10 ($\alpha$=0.1) by varying $\rho$ values. (a) Average proximal loss across local models. (b) Cosine similarity between FedSOL gradient ($g_u^{\text{FedSOL}}$) and proximal gradient ($g_p$) during local learning.
Figure 4: Performance of FedAvg and FedSOL on CIFAR-10 ($\alpha$=0.1) with various setups: (a) sampling ratio, (b) the number of local epochs, (c) initial learning rate, and (d) perturbation strength. The error bars stand for the standard deviations.
Figure 5: Effect of adaptive perturbation strength in CIFAR-10 ($\alpha$=0.1). (a) Server test accuracy after 300 rounds. (b) Layer-wisely averaged $\lambda$ values of FedSOL ($\rho=1.0$) at round 200.
...and 8 more figures

Theorems & Definitions (3)

Proposition 1
Proposition 2
Theorem 1: Taylor's theorem

FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning

TL;DR

Abstract

FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (3)