Table of Contents
Fetching ...

Multi-Server FL with Overlapping Clients: A Latency-Aware Relay Framework

Yun Ji, Zeyu Chen, Xiaoxiong Zhong, Yanan Ma, Sheng Zhang, Yuguang Fang

TL;DR

The paper addresses latency and convergence challenges in multi-server federated learning with overlapping edge-server coverage. It introduces a cloud-free framework that uses Overlapping Clients (OCs) as relay nodes to propagate edge models across adjacent ESs, accompanied by a theoretical convergence bound that ties error to propagation depth. A conflict-graph-based local search algorithm optimizes relay routing and transmission timings to maximize cross-ES dissemination within latency constraints. Empirical results on MNIST and CIFAR-10 show faster convergence and higher accuracy than baselines, with more pronounced gains as the number of cells grows, enabling deeper model propagation without extra links.

Abstract

Multi-server Federated Learning (FL) has emerged as a promising solution to mitigate communication bottlenecks of single-server FL. In a typical multi-server FL architecture, the regions covered by different edge servers (ESs) may overlap. Under this architecture, clients located in the overlapping areas can access edge models from multiple ESs. Building on this observation, we propose a cloud-free multi-server FL framework that leverages Overlapping Clients (OCs) as relays for inter-server model exchange while uploading the local updated model to ESs. This enables ES models to be relayed across multiple hops through neighboring ESs by OCs without introducing new communication links. We derive a new convergence upper bound for non-convex objectives under non-IID data and an arbitrary number of cells, which explicitly quantifies the impact of inter-server propagation depth on convergence error. Guided by this theoretical result, we formulate an optimization problem that aims to maximize dissemination range of each ES model among all ESs within a limited latency. To solve this problem, we develop a conflict-graph-based local search algorithm optimizing the routing strategy and scheduling the transmission times of individual ESs to its neighboring ESs. This enables ES models to be relayed across multiple hops through neighboring ESs by OCs, achieving the widest possible transmission coverage for each model without introducing new communication links. Extensive experimental results show remarkable performance gains of our scheme compared to existing methods.

Multi-Server FL with Overlapping Clients: A Latency-Aware Relay Framework

TL;DR

The paper addresses latency and convergence challenges in multi-server federated learning with overlapping edge-server coverage. It introduces a cloud-free framework that uses Overlapping Clients (OCs) as relay nodes to propagate edge models across adjacent ESs, accompanied by a theoretical convergence bound that ties error to propagation depth. A conflict-graph-based local search algorithm optimizes relay routing and transmission timings to maximize cross-ES dissemination within latency constraints. Empirical results on MNIST and CIFAR-10 show faster convergence and higher accuracy than baselines, with more pronounced gains as the number of cells grows, enabling deeper model propagation without extra links.

Abstract

Multi-server Federated Learning (FL) has emerged as a promising solution to mitigate communication bottlenecks of single-server FL. In a typical multi-server FL architecture, the regions covered by different edge servers (ESs) may overlap. Under this architecture, clients located in the overlapping areas can access edge models from multiple ESs. Building on this observation, we propose a cloud-free multi-server FL framework that leverages Overlapping Clients (OCs) as relays for inter-server model exchange while uploading the local updated model to ESs. This enables ES models to be relayed across multiple hops through neighboring ESs by OCs without introducing new communication links. We derive a new convergence upper bound for non-convex objectives under non-IID data and an arbitrary number of cells, which explicitly quantifies the impact of inter-server propagation depth on convergence error. Guided by this theoretical result, we formulate an optimization problem that aims to maximize dissemination range of each ES model among all ESs within a limited latency. To solve this problem, we develop a conflict-graph-based local search algorithm optimizing the routing strategy and scheduling the transmission times of individual ESs to its neighboring ESs. This enables ES models to be relayed across multiple hops through neighboring ESs by OCs, achieving the widest possible transmission coverage for each model without introducing new communication links. Extensive experimental results show remarkable performance gains of our scheme compared to existing methods.

Paper Structure

This paper contains 10 sections, 1 theorem, 32 equations, 2 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Let Assumption assumption1 holds, assume $\sum_{i=1}^{C}P_{y=i}\lambda_{x|y=i} = \lambda$, and set $\bm{w}^{*}$ to the optimal global model. If the learning rate satisfies $\eta_{r,e} = \frac{1}{(r+1)(E-1)}~\forall~r,~e$, then we have where $F_{R-1}^{(l)} = \sum_{j=1}^{L}\left|\frac{p_{R-1}^{(j,q)}\hat{N}_{R-1}^{(f_j)}}{\sum_{j=1}^{L} p_{R-1}^{(j,q)}\hat{N}_{R-1}^{(f_j)}} - \frac{\hat{N}_{R-1}^{(

Figures (2)

  • Figure 1: Illustration of our proposed algorithm.
  • Figure 2: Average test accuracy versus training time

Theorems & Definitions (3)

  • Theorem 1
  • proof
  • proof