Why Federated Optimization Fails to Achieve Perfect Fitting? A Theoretical Perspective on Client-Side Optima

Zhongxiang Lei; Qi Yang; Ping Qiu; Gang Zhang; Yuanchi Ma; Jinyan Liu

Why Federated Optimization Fails to Achieve Perfect Fitting? A Theoretical Perspective on Client-Side Optima

Zhongxiang Lei, Qi Yang, Ping Qiu, Gang Zhang, Yuanchi Ma, Jinyan Liu

TL;DR

This work addresses why federated optimization can fail to perfectly fit heterogeneous client data by introducing the assumption of heterogeneous local optima and deriving a lower bound on the global objective that grows with local-optima dispersion. It further characterizes an oscillatory convergence region near the end of training and analyzes three federated-method families (LA, DC, SA), providing an LA-FedAVG trajectory theorem, drift-correction conditions, and SA behavior under heterogeneity. The theoretical results are supported by experiments across diverse neural architectures (GRU, ResNet-18, ViT, DeepSeek) and datasets, and the authors provide an open-source FedTorch framework for replication. The findings offer practical guidance on choosing local update counts, participation rates, and correction strategies to mitigate underfitting in non-iid settings, with broad implications for federated learning practice.

Abstract

Federated optimization is a constrained form of distributed optimization that enables training a global model without directly sharing client data. Although existing algorithms can guarantee convergence in theory and often achieve stable training in practice, the reasons behind performance degradation under data heterogeneity remain unclear. To address this gap, the main contribution of this paper is to provide a theoretical perspective that explains why such degradation occurs. We introduce the assumption that heterogeneous client data lead to distinct local optima, and show that this assumption implies two key consequences: 1) the distance among clients' local optima raises the lower bound of the global objective, making perfect fitting of all client data impossible; and 2) in the final training stage, the global model oscillates within a region instead of converging to a single optimum, limiting its ability to fully fit the data. These results provide a principled explanation for performance degradation in non-iid settings, which we further validate through experiments across multiple tasks and neural network architectures. The framework used in this paper is open-sourced at: https://github.com/NPCLEI/fedtorch.

Why Federated Optimization Fails to Achieve Perfect Fitting? A Theoretical Perspective on Client-Side Optima

TL;DR

Abstract

Why Federated Optimization Fails to Achieve Perfect Fitting? A Theoretical Perspective on Client-Side Optima

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (18)