Table of Contents
Fetching ...

Improving Local Training in Federated Learning via Temperature Scaling

Kichang Lee, Pei Zhang, Songkuk Kim, JeongGil Ko

TL;DR

This paper tackles the slow convergence of federated learning under non-i.i.d. data by introducing FLex\&Chill, which applies logit chilling through a low softmax temperature $T<1$ during local training. The authors provide theoretical convergence analysis showing gradient amplification with $T<1$ and a bound $\mathbb{E}[F(w^{(t+1)})-F(w^*)] \le (1-\frac{\eta\mu}{T})\mathbb{E}[F(w^{(t)})-F(w^*)] + \frac{L\eta^2\sigma^2}{2T^2}$, alongside extensive empirical results across FEMNIST, CIFAR10, and CIFAR100 demonstrating up to $6\times$ faster convergence and up to $3.37\%$ improvement in inference accuracy. FLex\&Chill is model- and dataset-agnostic, orthogonal to FedProx, SCAFFOLD, and FedBN, and is supported by analyses of gradient norms, CKA-based feature-space similarity, and calibration. The work shows that training-time temperature control can robustly accelerate FL in heterogeneous data environments and provides open-source tooling to enable adoption and further exploration.

Abstract

Federated learning is inherently hampered by data heterogeneity: non-i.i.d. training data over local clients. We propose a novel model training approach for federated learning, FLex&Chill, which exploits the Logit Chilling method. Through extensive evaluations, we demonstrate that, in the presence of non-i.i.d. data characteristics inherent in federated learning systems, this approach can expedite model convergence and improve inference accuracy. Quantitatively, from our experiments, we observe up to 6X improvement in the global federated learning model convergence time, and up to 3.37% improvement in inference accuracy.

Improving Local Training in Federated Learning via Temperature Scaling

TL;DR

This paper tackles the slow convergence of federated learning under non-i.i.d. data by introducing FLex\&Chill, which applies logit chilling through a low softmax temperature during local training. The authors provide theoretical convergence analysis showing gradient amplification with and a bound , alongside extensive empirical results across FEMNIST, CIFAR10, and CIFAR100 demonstrating up to faster convergence and up to improvement in inference accuracy. FLex\&Chill is model- and dataset-agnostic, orthogonal to FedProx, SCAFFOLD, and FedBN, and is supported by analyses of gradient norms, CKA-based feature-space similarity, and calibration. The work shows that training-time temperature control can robustly accelerate FL in heterogeneous data environments and provides open-source tooling to enable adoption and further exploration.

Abstract

Federated learning is inherently hampered by data heterogeneity: non-i.i.d. training data over local clients. We propose a novel model training approach for federated learning, FLex&Chill, which exploits the Logit Chilling method. Through extensive evaluations, we demonstrate that, in the presence of non-i.i.d. data characteristics inherent in federated learning systems, this approach can expedite model convergence and improve inference accuracy. Quantitatively, from our experiments, we observe up to 6X improvement in the global federated learning model convergence time, and up to 3.37% improvement in inference accuracy.
Paper Structure (21 sections, 31 equations, 24 figures, 9 tables, 1 algorithm)

This paper contains 21 sections, 31 equations, 24 figures, 9 tables, 1 algorithm.

Figures (24)

  • Figure 1: Effect of varying temperature $T$ on the output distribution of the softmax function, illustrating how lower $T$ sharpens class probabilities and higher $T$ produces smoother, more uniform distributions. Best viewed in color.
  • Figure 2: Distribution of gradient norm at input layer for correctly (top) / incorrectly (bottom) inferred samples with varying training temperatures.
  • Figure 3: Distributions depicting differences between distances to the decision boundary before and after model updates for varying training temperatures. Notice that lower temperatures show a noticeable shift in estimations' positions on the representation space, suggesting their aggressiveness in modifying the model even with a small number of training samples.
  • Figure 4: Example of data points in the 2D representation space with their respective classification boundaries for different federated learning clients with varying training temperatures. Best viewed in color.
  • Figure 5: Visualization of the distribution of training data used in FEMNIST, CIFAR10-CNN and CIFAR100-ResNet experiments. Best viewed in color.
  • ...and 19 more figures

Theorems & Definitions (1)

  • proof