ProbSelect: Stochastic Client Selection for GPU-Accelerated Compute Devices in the 3D Continuum
Andrija Stanisic, Stefan Nastic
TL;DR
This work tackles the straggler problem in GPU-accelerated Federated Learning across the 3D Compute Continuum by introducing ProbSelect, a probabilistic client selection framework that relies on an Analytical Latency Model (ALM) to estimate per-device latency without historical data. ALM decomposes latency into download, compute, and upload components and introduces an efficiency threshold $\eta_i^{\text{th}}$ to determine deadline feasibility; assuming FLOP utilization $\eta_i \sim \mathcal{N}(\mu_i, \sigma_i^2)$, ProbSelect computes $p_i = P(\tau_i \leq \tau^{\text{slo}})$ and selects devices with $p_i \ge p^{\text{slo}}$. Evaluation across diverse GPUs and CNN workloads shows ALM achieving $MAPE < 5\%$ and ProbSelect improving SLO compliance by 13.77% on average while reducing computational waste by up to 72.5% compared to baselines like FedLim. The approach enables deadline-aware, GPU-centric client selection without continuous monitoring or historical data, which is particularly valuable in dynamic edge–cloud–space environments, though it does not optimize end-to-end convergence and future work will explore broader FLOP-efficiency distributions and workload coverage.
Abstract
Integration of edge, cloud and space devices into a unified 3D continuum imposes significant challenges for client selection in federated learning systems. Traditional approaches rely on continuous monitoring and historical data collection, which becomes impractical in dynamic environments where satellites and mobile devices frequently change operational conditions. Furthermore, existing solutions primarily consider CPU-based computation, failing to capture complex characteristics of GPU-accelerated training that is prevalent across the 3D continuum. This paper introduces ProbSelect, a novel approach utilizing analytical modeling and probabilistic forecasting for client selection on GPU-accelerated devices, without requiring historical data or continuous monitoring. We model client selection within user-defined SLOs. Extensive evaluation across diverse GPU architectures and workloads demonstrates that ProbSelect improves SLO compliance by 13.77% on average while achieving 72.5% computational waste reduction compared to baseline approaches.
