Improving Local Training in Federated Learning via Temperature Scaling
Kichang Lee, Pei Zhang, Songkuk Kim, JeongGil Ko
TL;DR
This paper tackles the slow convergence of federated learning under non-i.i.d. data by introducing FLex\&Chill, which applies logit chilling through a low softmax temperature $T<1$ during local training. The authors provide theoretical convergence analysis showing gradient amplification with $T<1$ and a bound $\mathbb{E}[F(w^{(t+1)})-F(w^*)] \le (1-\frac{\eta\mu}{T})\mathbb{E}[F(w^{(t)})-F(w^*)] + \frac{L\eta^2\sigma^2}{2T^2}$, alongside extensive empirical results across FEMNIST, CIFAR10, and CIFAR100 demonstrating up to $6\times$ faster convergence and up to $3.37\%$ improvement in inference accuracy. FLex\&Chill is model- and dataset-agnostic, orthogonal to FedProx, SCAFFOLD, and FedBN, and is supported by analyses of gradient norms, CKA-based feature-space similarity, and calibration. The work shows that training-time temperature control can robustly accelerate FL in heterogeneous data environments and provides open-source tooling to enable adoption and further exploration.
Abstract
Federated learning is inherently hampered by data heterogeneity: non-i.i.d. training data over local clients. We propose a novel model training approach for federated learning, FLex&Chill, which exploits the Logit Chilling method. Through extensive evaluations, we demonstrate that, in the presence of non-i.i.d. data characteristics inherent in federated learning systems, this approach can expedite model convergence and improve inference accuracy. Quantitatively, from our experiments, we observe up to 6X improvement in the global federated learning model convergence time, and up to 3.37% improvement in inference accuracy.
