Table of Contents
Fetching ...

From Federated Learning to X-Learning: Breaking the Barriers of Decentrality Through Random Walks

Allan Salihovic, Payam Abdisarabshali, Michael Langberg, Seyyedali Hosseinalipour

TL;DR

XL reframes distributed ML by enabling ML models to act as autonomous walkers that traverse networks via multi-hop D2D transfers, thereby breaking rigid central-server or one-hop patterns. The framework formalizes a DoF-rich design space, including walker count, transition policies, local update budgets, memory, and inter-walker collaboration, and demonstrates single- and multi-walker strategies with elastic, perception-aware, and memory-enabled dynamics. Theoretical convergence results for time-varying transition matrices show $O(1/\sqrt{K})$ rates under standard assumptions, with corollaries linking visitation frequencies to data proportions. Practically, XL offers resource-efficient, topology-aware, scalable learning for heterogeneous, dynamic networks, with potential impacts on 6G IoT, vehicular networks, and large-scale social graphs.

Abstract

We provide our perspective on X-Learning (XL), a novel distributed learning architecture that generalizes and extends the concept of decentralization. Our goal is to present a vision for XL, introducing its unexplored design considerations and degrees of freedom. To this end, we shed light on the intuitive yet non-trivial connections between XL, graph theory, and Markov chains. We also present a series of open research directions to stimulate further research.

From Federated Learning to X-Learning: Breaking the Barriers of Decentrality Through Random Walks

TL;DR

XL reframes distributed ML by enabling ML models to act as autonomous walkers that traverse networks via multi-hop D2D transfers, thereby breaking rigid central-server or one-hop patterns. The framework formalizes a DoF-rich design space, including walker count, transition policies, local update budgets, memory, and inter-walker collaboration, and demonstrates single- and multi-walker strategies with elastic, perception-aware, and memory-enabled dynamics. Theoretical convergence results for time-varying transition matrices show rates under standard assumptions, with corollaries linking visitation frequencies to data proportions. Practically, XL offers resource-efficient, topology-aware, scalable learning for heterogeneous, dynamic networks, with potential impacts on 6G IoT, vehicular networks, and large-scale social graphs.

Abstract

We provide our perspective on X-Learning (XL), a novel distributed learning architecture that generalizes and extends the concept of decentralization. Our goal is to present a vision for XL, introducing its unexplored design considerations and degrees of freedom. To this end, we shed light on the intuitive yet non-trivial connections between XL, graph theory, and Markov chains. We also present a series of open research directions to stimulate further research.

Paper Structure

This paper contains 22 sections, 2 theorems, 103 equations, 6 figures, 2 tables.

Key Result

Theorem 1

Assume that the learning-rate satisfies $\eta_k < \min\left\{\frac{1}{2 \beta}\sqrt{\frac{\zeta^{(k)}}{\left(1+ \zeta^{(k)}\right) \left(\ell^{(k)}\right) \left(\ell^{(k)}-1\right)}}, \frac{1}{2\beta} \right\}$, where $\zeta^{(k)}\in (0,1/4)$ is a constant. Also, let $\left(\sigma_u\right)^2$ den

Figures (6)

  • Figure 1: Centralized to semi/fully-decentralized FedL/FogL architectures (A)-(C), and the extension presented by the $\mathbb{X}$L (D). In centralized to semi/fully-decentralized FedL/FogL, each node has its own local ML model, whereas in $\mathbb{X}$L there are only two ML models associated with the random walkers, inducing two active sessions per training round.
  • Figure 2: Performance of random walkers under different node traversal strategies for CIFAR-10 and SVHN datasets (smoothed using a moving average with a window size of 10). Referring to Sec. \ref{['sec:temp']}, among methods with fixed/static node importance values (red, blue, green, magenta, and yellow curves using static sampling or fixed $Z \in \{0, 0.5, 1\}$), our approach with non-binary weighting of data and spatial quality metrics (yellow curve, $Z = 0.5$) yields the best performance. Furthermore, referring to Sec. \ref{['sec:perception']}, our method with dynamic node importance (black curve using a time-varying $Z^{(k)}_{\mathsf{Inst}}$ based on current model accuracy) outperforms all the static strategies.
  • Figure 3: Performance comparison between our elastic random walker, where the number of SGD iterations is adaptively scaled based on node data quality, and methods with fixed SGD iteration numbers for CIFAR-10 and SVHN datasets (smoothed using a moving average with a window size of 10). Our elastic walker (blue curves) outperforms all baselines with static SGD iteration numbers (as depicted in the top plot for each dataset) while performing fewer SGD updates (as depicted in the bottom plot for each dataset).
  • Figure 4: Performance comparisons (smoothed using a moving average with a window size of 10) between a memory-enabled random walker and a memoryless walker, both using the same node traversal strategy for CIFAR-10 and SVHN datasets. Initially, memory is disabled (red region) to accelerate early training. As training progresses, memory influence gradually increases (green and purple regions), helping mitigate model bias and yielding notable gains -- especially in the mid to late stages of training.
  • Figure 5: Impact of the number of random walkers on model convergence for CIFAR-10 and SVHN datasets. The top plot of each dataset shows the instantaneous average performance across all walkers, where increasing the number of walkers initially boosts performance; however, beyond a certain point (e.g., from 10 to 14 walkers), the improvements become marginal. The bottom plot of each dataset presents the walkers' final accuracies, highlighting an initial sharp gain followed by diminishing gains as more walkers are deployed.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Definition 1: Local Data Variability
  • proof
  • Corollary 1: Guaranteed Convergence of $\mathbb{X}$L
  • proof
  • Lemma 1: Smooth Function Characteristics 2200000050