Table of Contents
Fetching ...

Parameterizing Federated Continual Learning for Reproducible Research

Bart Cox, Jeroen Galjaard, Aditya Shankar, Jérémie Decouchant, Lydia Y. Chen

TL;DR

Frederick tackles reproducible Federated Continual Learning in heterogeneous and evolving environments by introducing Freddie, a Kubernetes-based, open-source framework that supports data and task heterogeneity, Task-IL and Domain-IL, and scalable experiments via Kubeflow training operators. It identifies essential requirements for FCL emulation (usability, reproducibility, complex workloads, resource heterogeneity) and provides novel training orchestration and extraction components along with three task-partition schemes (Column, Balanced, Shuffled). The paper demonstrates Freddie on CIFAR100 with large-scale FL and heterogeneous task sequences, revealing significant forgetting under certain workloads and highlighting practical considerations like co-scheduling effects and runtime variability. Overall, Freddie enables reproducible, scalable FCL research and serves as a foundation for evaluating, benchmarking, and extending Federated Continual Learning in realistic environments.

Abstract

Federated Learning (FL) systems evolve in heterogeneous and ever-evolving environments that challenge their performance. Under real deployments, the learning tasks of clients can also evolve with time, which calls for the integration of methodologies such as Continual Learning. To enable research reproducibility, we propose a set of experimental best practices that precisely capture and emulate complex learning scenarios. Our framework, Freddie, is the first entirely configurable framework for Federated Continual Learning (FCL), and it can be seamlessly deployed on a large number of machines thanks to the use of Kubernetes and containerization. We demonstrate the effectiveness of Freddie on two use cases, (i) large-scale FL on CIFAR100 and (ii) heterogeneous task sequence on FCL, which highlight unaddressed performance challenges in FCL scenarios.

Parameterizing Federated Continual Learning for Reproducible Research

TL;DR

Frederick tackles reproducible Federated Continual Learning in heterogeneous and evolving environments by introducing Freddie, a Kubernetes-based, open-source framework that supports data and task heterogeneity, Task-IL and Domain-IL, and scalable experiments via Kubeflow training operators. It identifies essential requirements for FCL emulation (usability, reproducibility, complex workloads, resource heterogeneity) and provides novel training orchestration and extraction components along with three task-partition schemes (Column, Balanced, Shuffled). The paper demonstrates Freddie on CIFAR100 with large-scale FL and heterogeneous task sequences, revealing significant forgetting under certain workloads and highlighting practical considerations like co-scheduling effects and runtime variability. Overall, Freddie enables reproducible, scalable FCL research and serves as a foundation for evaluating, benchmarking, and extending Federated Continual Learning in realistic environments.

Abstract

Federated Learning (FL) systems evolve in heterogeneous and ever-evolving environments that challenge their performance. Under real deployments, the learning tasks of clients can also evolve with time, which calls for the integration of methodologies such as Continual Learning. To enable research reproducibility, we propose a set of experimental best practices that precisely capture and emulate complex learning scenarios. Our framework, Freddie, is the first entirely configurable framework for Federated Continual Learning (FCL), and it can be seamlessly deployed on a large number of machines thanks to the use of Kubernetes and containerization. We demonstrate the effectiveness of Freddie on two use cases, (i) large-scale FL on CIFAR100 and (ii) heterogeneous task sequence on FCL, which highlight unaddressed performance challenges in FCL scenarios.
Paper Structure (6 sections, 4 figures, 1 table)

This paper contains 6 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of Freddie. An Orchestrator and an Extractor are respectively used for deploying experiments and collecting data. Experiments are run as TrainJobs managed by Kubeflow Training Operators. Within such a job, the experiment is controlled by the federator and learned by the clients.
  • Figure 2: Column, balanced and shuffled task partition schemes for CL.
  • Figure 3: Client and federator round durations with Freddie for small (5-20 clients, LeNet5 & CIFAR10) and large scale experiment (ResNet-18 & CIFAR100). Client durations are scaled by the total number of clients (WS).
  • Figure 4: Impact of task heterogeneity on FCL.