Converge Faster, Talk Less: Hessian-Informed Federated Zeroth-Order Optimization
Zhe Li, Bicheng Ying, Zidong Liu, Chaosheng Dong, Haibo Yang
TL;DR
This work introduces HiSo, a Hessian-informed zeroth-order federated optimization method that preserves strict scalar-only communication while leveraging a global diagonal Hessian approximation to accelerate convergence. The authors develop a generalized scalar-only FL framework, derive a Hessian-informed ascent step, and learn curvature diagonally with Adam-like updates without increasing communication. Under a low-effective rank Hessian assumption, HiSo achieves convergence rates that are independent of model dimension $d$ and Lipschitz constant $L$, outperforming prior ZO-FL baselines in both theoretical guarantees and empirical LLM fine-tuning tasks, with substantial reductions in communication rounds and total data exchanged. The results demonstrate that incorporating curvature information through diagonal preconditioning can dramatically improve ZO-FL efficiency, making it practical for large-scale federated fine-tuning scenarios.
Abstract
Zeroth-order (ZO) optimization enables dimension-free communication in federated learning (FL), making it attractive for fine-tuning of large language models (LLMs) due to significant communication savings. However, existing ZO-FL methods largely overlook curvature information, despite its well-established benefits for convergence acceleration. To address this, we propose HiSo, a Hessian-informed ZO federated optimization method that accelerates convergence by leveraging global diagonal Hessian approximations, while strictly preserving scalar-only communication without transmitting any second-order information. Theoretically, for non-convex functions, we show that HiSo can achieve an accelerated convergence rate that is independent of the Lipschitz constant $L$ and model dimension $d$ under some Hessian approximation assumptions, offering a plausible explanation for the observed phenomenon of ZO convergence being much faster than its worst-case $\mathscr{O}(d)$-bound. Empirically, across diverse LLM fine-tuning benchmarks, HiSo delivers a 1$\sim$5$\times$ speedup in communication rounds over existing state-of-the-art ZO-FL baselines. This superior convergence not only cuts communication costs but also provides strong empirical evidence that Hessian information acts as an effective accelerator in federated ZO optimization settings. Our source code is provided at https://github.com/ZidongLiu/DeComFL.
