Table of Contents
Fetching ...

Towards Optimal Heterogeneous Client Sampling in Multi-Model Federated Learning

Haoran Zhang, Zejun Gong, Zekai Li, Marie Siew, Carlee Joe-Wong, Rachid El-Azouzi

TL;DR

This paper tackles multi-model federated learning (MMFL) under heterogeneous client constraints by first establishing a convergence analysis for MMFL with arbitrary client sampling and then proposing loss-based variance-reduced sampling (MMFL-LVR) to minimize per-round variance while honoring server and client budgets. To further stabilize training, it introduces MMFL-StaleVR, which optimally leverages stale updates, and MMFL-StaleVRE, a low-overhead variant that approximates the optimal stale-weighting using only active clients. Empirical results on Fashion-MNIST, EMNIST, CIFAR-10, and Shakespeare show MMFL-LVR and especially MMFL-StaleVR achieving up to 19.1% higher average accuracy than random scheduling and within 5.4% of full participation, demonstrating robust performance under diverse data distributions and resource heterogeneity. Overall, the work provides a principled, scalable framework for efficiently coordinating concurrent model training across heterogeneous MMFL deployments, with practical implications for edge deployments under limited bandwidth and compute resources.

Abstract

Federated learning (FL) allows edge devices to collaboratively train models without sharing local data. As FL gains popularity, clients may need to train multiple unrelated FL models, but communication constraints limit their ability to train all models simultaneously. While clients could train FL models sequentially, opportunistically having FL clients concurrently train different models -- termed multi-model federated learning (MMFL) -- can reduce the overall training time. Prior work uses simple client-to-model assignments that do not optimize the contribution of each client to each model over the course of its training. Prior work on single-model FL shows that intelligent client selection can greatly accelerate convergence, but naïve extensions to MMFL can violate heterogeneous resource constraints at both the server and the clients. In this work, we develop a novel convergence analysis of MMFL with arbitrary client sampling methods, theoretically demonstrating the strengths and limitations of previous well-established gradient-based methods. Motivated by this analysis, we propose MMFL-LVR, a loss-based sampling method that minimizes training variance while explicitly respecting communication limits at the server and reducing computational costs at the clients. We extend this to MMFL-StaleVR, which incorporates stale updates for improved efficiency and stability, and MMFL-StaleVRE, a lightweight variant suitable for low-overhead deployment. Experiments show our methods improve average accuracy by up to 19.1% over random sampling, with only a 5.4% gap from the theoretical optimum (full client participation).

Towards Optimal Heterogeneous Client Sampling in Multi-Model Federated Learning

TL;DR

This paper tackles multi-model federated learning (MMFL) under heterogeneous client constraints by first establishing a convergence analysis for MMFL with arbitrary client sampling and then proposing loss-based variance-reduced sampling (MMFL-LVR) to minimize per-round variance while honoring server and client budgets. To further stabilize training, it introduces MMFL-StaleVR, which optimally leverages stale updates, and MMFL-StaleVRE, a low-overhead variant that approximates the optimal stale-weighting using only active clients. Empirical results on Fashion-MNIST, EMNIST, CIFAR-10, and Shakespeare show MMFL-LVR and especially MMFL-StaleVR achieving up to 19.1% higher average accuracy than random scheduling and within 5.4% of full participation, demonstrating robust performance under diverse data distributions and resource heterogeneity. Overall, the work provides a principled, scalable framework for efficiently coordinating concurrent model training across heterogeneous MMFL deployments, with practical implications for edge deployments under limited bandwidth and compute resources.

Abstract

Federated learning (FL) allows edge devices to collaboratively train models without sharing local data. As FL gains popularity, clients may need to train multiple unrelated FL models, but communication constraints limit their ability to train all models simultaneously. While clients could train FL models sequentially, opportunistically having FL clients concurrently train different models -- termed multi-model federated learning (MMFL) -- can reduce the overall training time. Prior work uses simple client-to-model assignments that do not optimize the contribution of each client to each model over the course of its training. Prior work on single-model FL shows that intelligent client selection can greatly accelerate convergence, but naïve extensions to MMFL can violate heterogeneous resource constraints at both the server and the clients. In this work, we develop a novel convergence analysis of MMFL with arbitrary client sampling methods, theoretically demonstrating the strengths and limitations of previous well-established gradient-based methods. Motivated by this analysis, we propose MMFL-LVR, a loss-based sampling method that minimizes training variance while explicitly respecting communication limits at the server and reducing computational costs at the clients. We extend this to MMFL-StaleVR, which incorporates stale updates for improved efficiency and stability, and MMFL-StaleVRE, a lightweight variant suitable for low-overhead deployment. Experiments show our methods improve average accuracy by up to 19.1% over random sampling, with only a 5.4% gap from the theoretical optimum (full client participation).

Paper Structure

This paper contains 22 sections, 10 theorems, 99 equations, 6 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Let $w_s^*$ denote the optimal weights of model $s$. If the learning rate $\eta_{\tau,s}=\frac{16}{\mu } \frac{1}{(\tau+1)K+\gamma_{\tau,s}}$, then We define $\gamma_{\tau,s}=\max \{\frac{32L}{\mu},4K\sum_{i\in \mathcal{N}_s}\sum_{b=1}^{B_i} \mathbbm{1}_{(i,b)}^{s,\tau} P_{(i,b),s}^\tau\}$, $V_\tau=\max\{\gamma_\tau^2 \mathbb{E}(\|w_s^0-w_s^*\|^2), (\frac{16}{\mu})^2\sum_{\tau'=0}^{\tau-1}z_{\tau

Figures (6)

  • Figure 1: Overview of the MMFL system for an example with $S = 3$ models. In each global round $\tau$, the server probabilistically assigns models to a subset of processors at the FL clients. Models that each client has the data to train are shown at each client, and faded models indicate ones that have not been assigned in this training round.
  • Figure 2: Comparison of the summed global step size of all models ($\sum_{s=1}^S \|H_{\tau,s}\|_1=\sum_{s=1}^S \sum_{(i,b)\in \mathcal{A}_{\tau,s}} P_{(i,b),s}^\tau$) Detailed experiment settings are the same as described in Section \ref{['sec:setup']}. Left: 3-model setting. Right: 5-model setting. MMFL-GVR's global step size is unstable, potentially harming the training stability. In contrast, MMFL-LVR's participation heterogeneity is much lower, leading to more stable convergence.
  • Figure 3: Optimal $\beta_i^t$ for 2 clients across training rounds. In each subplot, the blue curve represents the optimal $\beta_i^t$ computed using Eq. \ref{['eq:solutionStaleb']}, while the red stars indicate the rounds during which the client was active. Experiment setting: EMNIST ($S = 1$, the same setting as Section \ref{['sec:fixedDistribution']}).
  • Figure 4: Comparison of MMFL-GVR and RoundRobin-GVR: target accuracy vs required global rounds. Lower curve indicates MMFL algorithm achieves the target accuracy faster than SMFL algorithm. Left: 3-model setting. Right: 5-model setting. RoundRobin-GVR fails to achieve target accuracies of 0.5 and 0.55 within 150 rounds.
  • Figure 5: Effect of staleness weights with fixed sampling distribution. We evaluate the effect of our dynamic staleness weights (Eq. \ref{['eq:solutionStaleb']}) with a fixed client sampling distribution using EMNIST ($S=1$). Clients are divided into two groups with participation rates of 4% and 16%. MMFL-StaleVR achieves a final accuracy of 0.74, outperforming FedStale and FedVARP, which use static weights for the stale updates (max 0.71).
  • ...and 1 more figures

Theorems & Definitions (13)

  • Remark 1: Independent processor sampling
  • Definition 1: Non-iid data distribution
  • Theorem 1: Convergence
  • Remark 2: Upper bound convergence to zero
  • Theorem 2: Optimal MMFL-LVR assignment probabilities
  • Theorem 3: MMFL-StaleVR optimal solution
  • Theorem 4: Convergence
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • ...and 3 more