Table of Contents
Fetching ...

Technical note on Fisher Information for Robust Federated Cross-Validation

Behraj Khan, Tahir Qasim Syed

TL;DR

FIRE introduces a Fisher-information-based regularizer to mitigate fragmentation-induced covariate shift (FICS) in both batch/fold and federated learning settings by aligning training to a fixed validation distribution $P_{val}$. It formalizes a unified framework, derives a tractable Fisher Information Matrix surrogate for $D_{KL}(P_i \\| P_{val})$, and aggregates FIM across batches or clients to guide updates with a Fisher-regularized objective. Empirically, FIRE yields consistent accuracy gains over importance-weighting and FL baselines across 39 datasets, with notable improvements under varying fragmentation frequencies and non-IID client conditions, while maintaining stability. The work provides theoretical bounds linking KL divergence to the Fisher surrogate and demonstrates practical, low-overhead computation via diagonal or low-rank FIM approximations. Overall, FIRE offers a principled, scalable approach to robust validation-time distribution alignment in fragmented and federated learning scenarios, enhancing generalization to the target validation distribution.

Abstract

When training data are fragmented across batches or federated-learned across different geographic locations, trained models manifest performance degradation. That degradation partly owes to covariate shift induced by data having been fragmented across time and space and producing dissimilar empirical training distributions. Each fragment's distribution is slightly different to a hypothetical unfragmented training distribution of covariates, and to the single validation distribution. To address this problem, we propose Fisher Information for Robust fEderated validation (\textbf{FIRE}). This method accumulates fragmentation-induced covariate shift divergences from the global training distribution via an approximate Fisher information. That term, which we prove to be a more computationally-tractable estimate, is then used as a per-fragment loss penalty, enabling scalable distribution alignment. FIRE outperforms importance weighting benchmarks by $5.1\%$ at maximum and federated learning (FL) benchmarks by up to $5.3\%$ on shifted validation sets.

Technical note on Fisher Information for Robust Federated Cross-Validation

TL;DR

FIRE introduces a Fisher-information-based regularizer to mitigate fragmentation-induced covariate shift (FICS) in both batch/fold and federated learning settings by aligning training to a fixed validation distribution . It formalizes a unified framework, derives a tractable Fisher Information Matrix surrogate for , and aggregates FIM across batches or clients to guide updates with a Fisher-regularized objective. Empirically, FIRE yields consistent accuracy gains over importance-weighting and FL baselines across 39 datasets, with notable improvements under varying fragmentation frequencies and non-IID client conditions, while maintaining stability. The work provides theoretical bounds linking KL divergence to the Fisher surrogate and demonstrates practical, low-overhead computation via diagonal or low-rank FIM approximations. Overall, FIRE offers a principled, scalable approach to robust validation-time distribution alignment in fragmented and federated learning scenarios, enhancing generalization to the target validation distribution.

Abstract

When training data are fragmented across batches or federated-learned across different geographic locations, trained models manifest performance degradation. That degradation partly owes to covariate shift induced by data having been fragmented across time and space and producing dissimilar empirical training distributions. Each fragment's distribution is slightly different to a hypothetical unfragmented training distribution of covariates, and to the single validation distribution. To address this problem, we propose Fisher Information for Robust fEderated validation (\textbf{FIRE}). This method accumulates fragmentation-induced covariate shift divergences from the global training distribution via an approximate Fisher information. That term, which we prove to be a more computationally-tractable estimate, is then used as a per-fragment loss penalty, enabling scalable distribution alignment. FIRE outperforms importance weighting benchmarks by at maximum and federated learning (FL) benchmarks by up to on shifted validation sets.

Paper Structure

This paper contains 25 sections, 10 theorems, 37 equations, 3 figures, 9 tables, 1 algorithm.

Key Result

Lemma 2.3

Under Assumption ass:data we have $D_{\mathrm{KL}}(P_i(x)\|P_{\mathrm{val}}(x)) = \mathbb E_{x\sim P_{\mathrm{val}}}[ r(x)\log r(x)] \le C_1 \gamma^2 + C_1' \gamma^3,$ where one can take for instance $C_1=\frac{1}{2(1-\gamma)},\qquad C_1'=\frac{1}{3(1-\gamma)^2}.$ see proof detail in appendix lemma:

Figures (3)

  • Figure 1: FIRE working mechanism in FL setting. The server broadcasts the global model $\theta$ and global FIM $I_G(\theta)$ to all clients. Each client $k$ computes its local FIM $I_k(\theta)$ using the shared validation set $P_{\text{val}}(x)$. Clients perform a local update regularized by $I_G(\theta)$ and send their local FIMs back to the server. The server then aggregates the client FIMs (e.g., $I_G(\theta) = \sum_{k=1}^N \frac{n_k}{N} I_k(\theta)$) to update the global FIM for the next round. This unified approach ensures model alignment with the target validation distribution in both settings.
  • Figure 2: st-CV and FIRE, $\Delta$ accuracy for varying number of folds.
  • Figure 3: Effect of batching frequency. As the number of batch frequency increases the drop in accuracy also increases.

Theorems & Definitions (17)

  • Lemma 2.3: Marginal KL bound
  • Lemma 2.4: Local conditional KL quadratic expansion
  • Theorem 2.5: KL divergence bound via Fisher information
  • Theorem B.3: KL divergence bound via Fisher information
  • Proof B.1
  • Lemma B.4: Marginal KL bound
  • Proof B.2
  • Lemma B.5: Local conditional KL quadratic expansion
  • Proof B.3
  • corollary B.6: FIRE Surrogate via Fisher Information
  • ...and 7 more