Table of Contents
Fetching ...

Temperature Scaling Attack Disrupting Model Confidence in Federated Learning

Kichang Lee, Jaeho Jin, JaeYeon Park, Songkuk Kim, JeongGil Ko

TL;DR

This work identifies predictive confidence calibration as a distinct, actionable attack surface in federated learning and introduces the Temperature Scaling Attack (TSA), a training-time mechanism that degrades calibration by injecting temperature scaling into local updates while preserving accuracy. The authors establish an effective-step-size invariance via learning-rate–temperature coupling, $eta= rac{ ilde{eta}}{ au}$, and provide a non-convex convergence analysis showing stable optimization under non-IID data despite miscalibration. Empirically, TSA delivers large calibration errors (e.g., up to +145% ECE on CIFAR-100) with minimal accuracy changes and remains effective against robust aggregation and post-hoc calibration defenses, as demonstrated across MNIST, CIFAR, healthcare, robotics, and language-generation case studies. The results highlight calibration integrity as a critical, under-defended facet of FL, motivating calibration-aware auditing and defenses to protect safety-critical decision pipelines relying on probabilistic confidence.

Abstract

Predictive confidence serves as a foundational control signal in mission-critical systems, directly governing risk-aware logic such as escalation, abstention, and conservative fallback. While prior federated learning attacks predominantly target accuracy or implant backdoors, we identify confidence calibration as a distinct attack objective. We present the Temperature Scaling Attack (TSA), a training-time attack that degrades calibration while preserving accuracy. By injecting temperature scaling with learning rate-temperature coupling during local training, malicious updates maintain benign-like optimization behavior, evading accuracy-based monitoring and similarity-based detection. We provide a convergence analysis under non-IID settings, showing that this coupling preserves standard convergence bounds while systematically distorting confidence. Across three benchmarks, TSA substantially shifts calibration (e.g., 145% error increase on CIFAR-100) with <2 accuracy change, and remains effective under robust aggregation and post-hoc calibration defenses. Case studies further show that confidence manipulation can cause up to 7.2x increases in missed critical cases (healthcare) or false alarms (autonomous driving), even when accuracy is unchanged. Overall, our results establish calibration integrity as a critical attack surface in federated learning.

Temperature Scaling Attack Disrupting Model Confidence in Federated Learning

TL;DR

This work identifies predictive confidence calibration as a distinct, actionable attack surface in federated learning and introduces the Temperature Scaling Attack (TSA), a training-time mechanism that degrades calibration by injecting temperature scaling into local updates while preserving accuracy. The authors establish an effective-step-size invariance via learning-rate–temperature coupling, , and provide a non-convex convergence analysis showing stable optimization under non-IID data despite miscalibration. Empirically, TSA delivers large calibration errors (e.g., up to +145% ECE on CIFAR-100) with minimal accuracy changes and remains effective against robust aggregation and post-hoc calibration defenses, as demonstrated across MNIST, CIFAR, healthcare, robotics, and language-generation case studies. The results highlight calibration integrity as a critical, under-defended facet of FL, motivating calibration-aware auditing and defenses to protect safety-critical decision pipelines relying on probabilistic confidence.

Abstract

Predictive confidence serves as a foundational control signal in mission-critical systems, directly governing risk-aware logic such as escalation, abstention, and conservative fallback. While prior federated learning attacks predominantly target accuracy or implant backdoors, we identify confidence calibration as a distinct attack objective. We present the Temperature Scaling Attack (TSA), a training-time attack that degrades calibration while preserving accuracy. By injecting temperature scaling with learning rate-temperature coupling during local training, malicious updates maintain benign-like optimization behavior, evading accuracy-based monitoring and similarity-based detection. We provide a convergence analysis under non-IID settings, showing that this coupling preserves standard convergence bounds while systematically distorting confidence. Across three benchmarks, TSA substantially shifts calibration (e.g., 145% error increase on CIFAR-100) with <2 accuracy change, and remains effective under robust aggregation and post-hoc calibration defenses. Case studies further show that confidence manipulation can cause up to 7.2x increases in missed critical cases (healthcare) or false alarms (autonomous driving), even when accuracy is unchanged. Overall, our results establish calibration integrity as a critical attack surface in federated learning.
Paper Structure (37 sections, 2 theorems, 30 equations, 20 figures, 5 tables)

This paper contains 37 sections, 2 theorems, 30 equations, 20 figures, 5 tables.

Key Result

Lemma 1

Consider the temperature-scaled softmax cross-entropy loss $\ell_\tau(\theta;x,y)$. Assume the logit vector $\mathbf{z}(\theta;x)\in\mathbb{R}^K$ is twice differentiable and its Jacobian w.r.t. $\theta$ is bounded as $\|\mathbf{J}_\theta\mathbf{z}(\theta;x)\|\le G(x)$, where $G(x)$ is an input-depen for some $C(x)$ depending on $G(x)$ and $\|x\|$. Consequently, the population objective is $L$-smoo

Figures (20)

  • Figure 1: Effect of training-time temperature scaling on logit updates and test-time confidence. Solid curves show the $\tau$-scaled sigmoid used to compute updates (---: $\tau{<}1$, ---: $\tau{>}1$), while evaluation uses the standard $\tau{=}1$ mapping (-$\;$-), so updated logits ($\bullet$$\rightarrow$$\bullet$) can yield shifted probabilities and degraded calibration; note the different x-axis ranges.
  • Figure 2: Confidence distribution and reliability diagrams under different training temperatures $\tau$ on CIFAR10--CNN. Top row: histogram of predicted confidences with average confidence (red dashed) and accuracy (blue dashed). Bottom row: reliability diagrams showing the gap between bin accuracy and confidence. Low temperatures ($\tau{<}1$) induce under-confidence, while high temperatures ($\tau{>}1$) induce over-confidence, despite accuracy remaining largely unchanged.
  • Figure 3: Global accuracy (left) and sECE (right) over rounds on CIFAR10--CNN for different training temperatures $\tau$. Accuracy remains similar across $\tau$, while sECE shifts systematically, indicating controllable under-/over-confidence.
  • Figure 4: Cosine similarity between benign updates (baseline, $\tau{=}1$) and updates under different attacks. Temperature scaling maintains high similarity with benign gradients, whereas noise addition and label flipping sharply reduce similarity.
  • Figure 5: Seed-to-seed similarity under benign training ($\tau{=}1$): pairwise logit cosine similarity (left) and penultimate-layer CKA (right) on the same testset.
  • ...and 15 more figures

Theorems & Definitions (2)

  • Lemma 1: Temperature-dependent smoothness
  • Theorem 1: Non-convex convergence with constant $\beta$.