Table of Contents
Fetching ...

FedRTS: Federated Robust Pruning via Combinatorial Thompson Sampling

Hong Huang, Hai Yang, Yuan Chen, Jiaxun Ye, Dapeng Wu

TL;DR

FedRTS addresses the challenges of federated pruning in resource-constrained, non-IID settings by reframing topology adjustment as a combinatorial multi-armed bandit and introducing a Thompson Sampling-based Adjustment (TSAdj) mechanism. TSAdj maintains per-link posterior distributions, samples probabilistic actions, and fuses global and client-specific information to stabilize sparse topologies while reducing communication to top-gradient indices. The framework includes a two-loop training process with outer-loop pruning guided by TSAdj and a theoretical regret bound, and it demonstrates state-of-the-art accuracy and lower communication costs on CV and NLP tasks under heterogeneous data and partial client participation. The work provides both practical performance gains and theoretical insights, with broad implications for efficient, robust federated pruning in real-world deployments.

Abstract

Federated Learning (FL) enables collaborative model training across distributed clients without data sharing, but its high computational and communication demands strain resource-constrained devices. While existing methods use dynamic pruning to improve efficiency by periodically adjusting sparse model topologies while maintaining sparsity, these approaches suffer from issues such as greedy adjustments, unstable topologies, and communication inefficiency, resulting in less robust models and suboptimal performance under data heterogeneity and partial client availability. To address these challenges, we propose Federated Robust pruning via combinatorial Thompson Sampling (FedRTS), a novel framework designed to develop robust sparse models. FedRTS enhances robustness and performance through its Thompson Sampling-based Adjustment (TSAdj) mechanism, which uses probabilistic decisions informed by stable, farsighted information instead of deterministic decisions reliant on unstable and myopic information in previous methods. Extensive experiments demonstrate that FedRTS achieves state-of-the-art performance in computer vision and natural language processing tasks while reducing communication costs, particularly excelling in scenarios with heterogeneous data distributions and partial client participation. Our codes are available at: https://github.com/Little0o0/FedRTS

FedRTS: Federated Robust Pruning via Combinatorial Thompson Sampling

TL;DR

FedRTS addresses the challenges of federated pruning in resource-constrained, non-IID settings by reframing topology adjustment as a combinatorial multi-armed bandit and introducing a Thompson Sampling-based Adjustment (TSAdj) mechanism. TSAdj maintains per-link posterior distributions, samples probabilistic actions, and fuses global and client-specific information to stabilize sparse topologies while reducing communication to top-gradient indices. The framework includes a two-loop training process with outer-loop pruning guided by TSAdj and a theoretical regret bound, and it demonstrates state-of-the-art accuracy and lower communication costs on CV and NLP tasks under heterogeneous data and partial client participation. The work provides both practical performance gains and theoretical insights, with broad implications for efficient, robust federated pruning in real-world deployments.

Abstract

Federated Learning (FL) enables collaborative model training across distributed clients without data sharing, but its high computational and communication demands strain resource-constrained devices. While existing methods use dynamic pruning to improve efficiency by periodically adjusting sparse model topologies while maintaining sparsity, these approaches suffer from issues such as greedy adjustments, unstable topologies, and communication inefficiency, resulting in less robust models and suboptimal performance under data heterogeneity and partial client availability. To address these challenges, we propose Federated Robust pruning via combinatorial Thompson Sampling (FedRTS), a novel framework designed to develop robust sparse models. FedRTS enhances robustness and performance through its Thompson Sampling-based Adjustment (TSAdj) mechanism, which uses probabilistic decisions informed by stable, farsighted information instead of deterministic decisions reliant on unstable and myopic information in previous methods. Extensive experiments demonstrate that FedRTS achieves state-of-the-art performance in computer vision and natural language processing tasks while reducing communication costs, particularly excelling in scenarios with heterogeneous data distributions and partial client participation. Our codes are available at: https://github.com/Little0o0/FedRTS

Paper Structure

This paper contains 48 sections, 8 theorems, 35 equations, 17 figures, 4 tables, 3 algorithms.

Key Result

Theorem 3.4

(Upper Bound) Under assumptions ass: mag, ass: top and ass: L and with outcomes $X_{t}$ defined in Eq. eq: x, the regret $Reg(T)$ of TSAdj can be upper bounded by: for any $\epsilon$ such that $\forall s \not= s^*_1$ and $s \notin S^*_{k}, \Delta_{s,k} > 2LK\epsilon$, where $U$ is a universal constant.

Figures (17)

  • Figure 1: Illustration of topology adjustment in existing baselines and FedRTS. Left: Existing baselines adjust model topology based on myopic and unstable aggregated weights and gradients via deterministic magnitude pruning and reactivating, resulting in greedy adjustment and high communication overhead. Right: FedRTS introduces Thompson Sampling-based Adjustment (TSAdj) to adjust the topology based on farsighted and stable probability distributions, achieving a robust topology and low communication overhead.
  • Figure 1: Performance comparison on TinyStories with GPT-2-32M.
  • Figure 2: The overview of the FedRTS, which integrates TSAdj and utilizes the two-loop updating to develop a robust sparse model.
  • Figure 2: Performance under Different Adjustment Interval ($\Delta T$)
  • Figure 3: Testing accuracy of FedRTS and different federated pruning baselines on the four CV datasets with different densities, where the ratio is communication cost relative to dense FedAVG.
  • ...and 12 more figures

Theorems & Definitions (8)

  • Theorem 3.4
  • Lemma C.1
  • Lemma C.2
  • Lemma C.3
  • Lemma C.4
  • Lemma C.5
  • Lemma C.6
  • Lemma C.7