Recurrent Early Exits for Federated Learning with Heterogeneous Clients
Royson Lee, Javier Fernandez-Marques, Shell Xu Hu, Da Li, Stefanos Laskaridis, Łukasz Dudziak, Timothy Hospedales, Ferenc Huszár, Nicholas D. Lane
TL;DR
Recurrent Early Exits for Federated Learning with Heterogeneous Clients tackles device heterogeneity in FL by introducing ReeFL, a transformer-based recurrent early exit module shared across sub-models that fuses multi-depth features into a single classifier. It enables per-client adaptive knowledge distillation by selecting the best-performing exit as the teacher and modulates backbone features to enhance deeper predictions, trained end-to-end with a unified objective that includes cross-entropy losses and a KL knowledge transfer term. Empirically, ReeFL outperforms depth- and width-based baselines (DepthFL, InclusiveFL, ScaleFL, ExclusiveFL) across CIFAR-100, FEMNIST, and SpeechCommands, for both 4 and 12 exits, while maintaining reasonable communication and compute costs under PEFT and full-finetuning regimes. The work demonstrates robust performance gains due to feature fusion, dynamic teacher selection, and shared classifier architecture, offering practical benefits for scalable FL on heterogeneous edge devices. It also provides extensive ablations on aggregation, distillation, and feature modulation, highlighting when ReeFL's components most strongly contribute to accuracy.
Abstract
Federated learning (FL) has enabled distributed learning of a model across multiple clients in a privacy-preserving manner. One of the main challenges of FL is to accommodate clients with varying hardware capacities; clients have differing compute and memory requirements. To tackle this challenge, recent state-of-the-art approaches leverage the use of early exits. Nonetheless, these approaches fall short of mitigating the challenges of joint learning multiple exit classifiers, often relying on hand-picked heuristic solutions for knowledge distillation among classifiers and/or utilizing additional layers for weaker classifiers. In this work, instead of utilizing multiple classifiers, we propose a recurrent early exit approach named ReeFL that fuses features from different sub-models into a single shared classifier. Specifically, we use a transformer-based early-exit module shared among sub-models to i) better exploit multi-layer feature representations for task-specific prediction and ii) modulate the feature representation of the backbone model for subsequent predictions. We additionally present a per-client self-distillation approach where the best sub-model is automatically selected as the teacher of the other sub-models at each client. Our experiments on standard image and speech classification benchmarks across various emerging federated fine-tuning baselines demonstrate ReeFL's effectiveness over previous works.
