Table of Contents
Fetching ...

BTFL: A Bayesian-based Test-Time Generalization Method for Internal and External Data Distributions in Federated learning

Yu Zhou, Bingyan Liu

TL;DR

This work defines Test-time Generalization for Internal and External Distributions in Federated Learning (TGFL) and introduces BTFL, a Bayesian-based method that balances personalization and generalization at test time using a two-head local/global architecture. BTFL uses a three-stage pipeline (Information Extraction, Analysis, and Interpolation Prediction) with Historical Bayesian Update (HBU) and Characteristic Bayesian Update (CBU) to form a Dual Posterior Injection (DPI) that yields the interpolated prediction $Y_{int}=e\cdot Y_g + (1-e)\cdot Y_l$, where $e$ is derived from a posterior over $m$ conditioned on test data. The approach provides theoretical guarantees, requires no online optimization, and demonstrates improved accuracy across CIFAR10, OfficeHome, and ImageNet-32 with 4x–12x speedups over baselines. The authors also introduce the BTGFL benchmark to fairly evaluate TGFL scenarios, showing BTFL’s robustness to IND/EXD shifts and its scalability across CNNs, ResNets, and Transformers, highlighting strong practical impact for resource-constrained FL deployments.

Abstract

Federated Learning (FL) enables multiple clients to collaboratively develop a global model while maintaining data privacy. However, online FL deployment faces challenges due to distribution shifts and evolving test samples. Personalized Federated Learning (PFL) tailors the global model to individual client distributions, but struggles with Out-Of-Distribution (OOD) samples during testing, leading to performance degradation. In real-world scenarios, balancing personalization and generalization during online testing is crucial and existing methods primarily focus on training-phase generalization. To address the test-time trade-off, we introduce a new scenario: Test-time Generalization for Internal and External Distributions in Federated Learning (TGFL), which evaluates adaptability under Internal Distribution (IND) and External Distribution (EXD). We propose BTFL, a Bayesian-based test-time generalization method for TGFL, which balances generalization and personalization at the sample level during testing. BTFL employs a two-head architecture to store local and global knowledge, interpolating predictions via a dual-Bayesian framework that considers both historical test data and current sample characteristics with theoretical guarantee and faster speed. Our experiments demonstrate that BTFL achieves improved performance across various datasets and models with less time cost. The source codes are made publicly available at https://github.com/ZhouYuCS/BTFL .

BTFL: A Bayesian-based Test-Time Generalization Method for Internal and External Data Distributions in Federated learning

TL;DR

This work defines Test-time Generalization for Internal and External Distributions in Federated Learning (TGFL) and introduces BTFL, a Bayesian-based method that balances personalization and generalization at test time using a two-head local/global architecture. BTFL uses a three-stage pipeline (Information Extraction, Analysis, and Interpolation Prediction) with Historical Bayesian Update (HBU) and Characteristic Bayesian Update (CBU) to form a Dual Posterior Injection (DPI) that yields the interpolated prediction , where is derived from a posterior over conditioned on test data. The approach provides theoretical guarantees, requires no online optimization, and demonstrates improved accuracy across CIFAR10, OfficeHome, and ImageNet-32 with 4x–12x speedups over baselines. The authors also introduce the BTGFL benchmark to fairly evaluate TGFL scenarios, showing BTFL’s robustness to IND/EXD shifts and its scalability across CNNs, ResNets, and Transformers, highlighting strong practical impact for resource-constrained FL deployments.

Abstract

Federated Learning (FL) enables multiple clients to collaboratively develop a global model while maintaining data privacy. However, online FL deployment faces challenges due to distribution shifts and evolving test samples. Personalized Federated Learning (PFL) tailors the global model to individual client distributions, but struggles with Out-Of-Distribution (OOD) samples during testing, leading to performance degradation. In real-world scenarios, balancing personalization and generalization during online testing is crucial and existing methods primarily focus on training-phase generalization. To address the test-time trade-off, we introduce a new scenario: Test-time Generalization for Internal and External Distributions in Federated Learning (TGFL), which evaluates adaptability under Internal Distribution (IND) and External Distribution (EXD). We propose BTFL, a Bayesian-based test-time generalization method for TGFL, which balances generalization and personalization at the sample level during testing. BTFL employs a two-head architecture to store local and global knowledge, interpolating predictions via a dual-Bayesian framework that considers both historical test data and current sample characteristics with theoretical guarantee and faster speed. Our experiments demonstrate that BTFL achieves improved performance across various datasets and models with less time cost. The source codes are made publicly available at https://github.com/ZhouYuCS/BTFL .

Paper Structure

This paper contains 23 sections, 33 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Averaged $\lambda_{s}$ on different datasets with various distributions of our benchmark BTGFL.
  • Figure 2: Overview of the proposed BTFL. It mainly includes three modules: Information Extraction, Information Analysis and Interpolation Prediction.
  • Figure 3: Illustration of our benchmark (BTGFL) for evaluating test-time generalization under the FL context.
  • Figure 4: Test accuracy of baselines on Synthetical tests with different degrees of heterogeneity. A simple CNN is trained on CIFAR10. The client heterogeneity is determined by the value of Dirichlet distribution yurochkin2019bayesianhsu2019measuring, termed as 'Dir'.
  • Figure 5: The test accuracy of benchmark models (encompassing the training of both ResNet20-GN and CCT4 on the CIFAR10 dataset with a heterogeneity factor of Dir(0.1)) are presented. Our methods are justified against five strong competitors, with the outcomes averaged across 5 IND/EXD test sets in BTGFL.
  • ...and 1 more figures