FedPM: Federated Learning Using Second-order Optimization with Preconditioned Mixing of Local Parameters

Hiro Ishii; Kenta Niwa; Hiroshi Sawada; Akinori Fujino; Noboru Harada; Rio Yokota

FedPM: Federated Learning Using Second-order Optimization with Preconditioned Mixing of Local Parameters

Hiro Ishii, Kenta Niwa, Hiroshi Sawada, Akinori Fujino, Noboru Harada, Rio Yokota

TL;DR

FedPM tackles drift in local second-order preconditioners in federated learning by replacing simple server mixing with preconditioned mixing of local parameters, aligning updates with the globally preconditioned curvature. The method decomposes the ideal global second-order update into per-client local updates and server-side mixing using a shared preconditioner, enabling global second-order optimization even with multiple local updates. A convergence analysis shows a superlinear rate for strongly convex objectives with a single local update, and FedPM employs FOOF-based preconditioner approximations to scale to deep networks. Empirically, FedPM outperforms FO and SO baselines on both strongly convex and non-convex tasks, particularly under data heterogeneity, confirming practical benefits for robust, fast FL training.

Abstract

We propose Federated Preconditioned Mixing (FedPM), a novel Federated Learning (FL) method that leverages second-order optimization. Prior methods--such as LocalNewton, LTDA, and FedSophia--have incorporated second-order optimization in FL by performing iterative local updates on clients and applying simple mixing of local parameters on the server. However, these methods often suffer from drift in local preconditioners, which significantly disrupts the convergence of parameter training, particularly in heterogeneous data settings. To overcome this issue, we refine the update rules by decomposing the ideal second-order update--computed using globally preconditioned global gradients--into parameter mixing on the server and local parameter updates on clients. As a result, our FedPM introduces preconditioned mixing of local parameters on the server, effectively mitigating drift in local preconditioners. We provide a theoretical convergence analysis demonstrating a superlinear rate for strongly convex objectives in scenarios involving a single local update. To demonstrate the practical benefits of FedPM, we conducted extensive experiments. The results showed significant improvements with FedPM in the test accuracy compared to conventional methods incorporating simple mixing, fully leveraging the potential of second-order optimization.

FedPM: Federated Learning Using Second-order Optimization with Preconditioned Mixing of Local Parameters

TL;DR

Abstract

FedPM: Federated Learning Using Second-order Optimization with Preconditioned Mixing of Local Parameters

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (2)