Rethinking the Starting Point: Collaborative Pre-Training for Federated Downstream Tasks

Yun-Wei Chu; Dong-Jun Han; Seyyedali Hosseinalipour; Christopher G. Brinton

Rethinking the Starting Point: Collaborative Pre-Training for Federated Downstream Tasks

Yun-Wei Chu, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher G. Brinton

TL;DR

Rethinking initialization for federated downstream tasks, CoPreFL uses a model-agnostic meta-learning framework to produce a robust, globally initialized model that generalizes to arbitrary heterogeneous FL tasks. It defines a variance-aware meta-objective that jointly optimizes average performance and fairness across clients, and it operates under two data-storage scenarios: purely distributed pre-training and hybrid client-server pre-training with a small server dataset. Empirical results on CIFAR-100, Tiny-ImageNet, FEMNIST, and PACS show CoPreFL achieving higher average accuracy and lower variance than strong baselines, while remaining compatible with common FL algorithms. The approach demonstrates resilience to unseen labels and domain shifts, highlighting its practical potential for robust FL deployments in privacy-sensitive, real-world settings.

Abstract

A few recent studies have demonstrated that leveraging centrally pre-trained models can offer advantageous initializations for federated learning (FL). However, existing pre-training methods do not generalize well when faced with an arbitrary set of downstream FL tasks. Specifically, they often (i) achieve limited average accuracy, particularly when there are unseen downstream labels, and (ii) result in significant accuracy variance, failing to provide a balanced performance across clients. To address these challenges, we propose CoPreFL, a collaborative/distributed pre-training approach which provides a robust initialization for downstream FL tasks. The key idea of CoPreFL is a model-agnostic meta-learning (MAML) procedure that tailors the global model to closely mimic heterogeneous and unseen FL scenarios, resulting in a pre-trained model that is rapidly adaptable to arbitrary FL tasks. Our MAML procedure incorporates performance variance into the meta-objective function, balancing performance across clients rather than solely optimizing for accuracy. Through extensive experiments, we demonstrate that CoPreFL obtains significant improvements in both average accuracy and variance across arbitrary downstream FL tasks with unseen/seen labels, compared with various pre-training baselines. We also show how CoPreFL is compatible with different well-known FL algorithms applied by the downstream tasks, enhancing performance in each case.

Rethinking the Starting Point: Collaborative Pre-Training for Federated Downstream Tasks

TL;DR

Abstract

Paper Structure (24 sections, 5 equations, 10 figures, 51 tables, 1 algorithm)

This paper contains 24 sections, 5 equations, 10 figures, 51 tables, 1 algorithm.

Introduction
Related Work
Proposed CoPreFL Methodology
Problem Setup and Pre-Training Objectives
CoPreFL in Scenario I (Pre-training with Distributed Clients)
CoPreFL in Scenario II (Hybrid Client-Server Pre-Training)
Experiments
Experimental Setup
Experimental Results
Conclusion
Key Applications
Detailed Procedure for CoPreFL in Scenario II
Detailed Settings for Datasets and Hyperparameters
Dataset Details
Hyperparameters and Compute Settings
...and 9 more sections

Figures (10)

Figure 1: (Left): Overview of CoPreFL, aiming to provide a robust initialization for an arbitrary set of downstream FL tasks. (Right): Average accuracy and variance achieved by FL tasks (from Section \ref{['sec:experiment']}) initialized by various pre-trained models. Centralized pre-training achieves limited performance as it is not able to capture the heterogeneous characteristics of unforseen FL settings. CoPreFL demonstrates improved performance in terms of both average accuracy and variance by strategically mimicing downstream FL scenarios during pre-training.
Figure 2: Testing accuracy distributions in various non-IID FL tasks. CoPreFL achieves the best average accuracy (i.e., right-leaning distribution) and smaller performance variance (i.e., narrower distribution) while also improving worst-performing clients' accuracies.
Figure 3: The distributions of testing accuracy in IID FL downstream tasks under various pre-training setups in scenario I on the CIFAR-100 dataset.
Figure 4: The distributions of testing accuracy in non-IID FL downstream tasks under various pre-training setups in scenario I on the CIFAR-100 dataset.
Figure 5: The distributions of testing accuracy in IID FL downstream tasks under various pre-training setups in scenario I on the Tiny-ImageNet dataset.
...and 5 more figures

Rethinking the Starting Point: Collaborative Pre-Training for Federated Downstream Tasks

TL;DR

Abstract

Rethinking the Starting Point: Collaborative Pre-Training for Federated Downstream Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (10)