Table of Contents
Fetching ...

Decentralized Federated Learning with Model Caching on Mobile Agents

Xiaoyu Wang, Guojun Xiong, Houwei Cao, Jian Li, Yong Liu

TL;DR

This work tackles the slow convergence of decentralized federated learning (DFL) in highly mobile networks by introducing Cached-DFL, which injects delay-tolerant, DTN-like model spreading through per-agent caches. Agents exchange both current and cached models upon encounters, and perform local aggregation over the union of cached models, with a convergence guarantee that accounts for model staleness introduced by caching. The approach is validated on vehicular-network-like simulations with MNIST, FashionMNIST, and CIFAR-10 across non-i.i.d, i.i.d, and Dirichlet data distributions, showing faster convergence and performance close to centralized FL, especially under heterogeneity. Group-based caching further enhances performance in grouped mobility scenarios, underscoring Cached-DFL’s practicality for real-world V2V and edge environments where traditional DFL struggles due to sporadic connectivity and data non-i.i.d.-ness.

Abstract

Federated Learning (FL) trains a shared model using data and computation power on distributed agents coordinated by a central server. Decentralized FL (DFL) utilizes local model exchange and aggregation between agents to reduce the communication and computation overheads on the central server. However, when agents are mobile, the communication opportunity between agents can be sporadic, largely hindering the convergence and accuracy of DFL. In this paper, we propose Cached Decentralized Federated Learning (Cached-DFL) to investigate delay-tolerant model spreading and aggregation enabled by model caching on mobile agents. Each agent stores not only its own model, but also models of agents encountered in the recent past. When two agents meet, they exchange their own models as well as the cached models. Local model aggregation utilizes all models stored in the cache. We theoretically analyze the convergence of Cached-DFL, explicitly taking into account the model staleness introduced by caching. We design and compare different model caching algorithms for different DFL and mobility scenarios. We conduct detailed case studies in a vehicular network to systematically investigate the interplay between agent mobility, cache staleness, and model convergence. In our experiments, Cached-DFL converges quickly, and significantly outperforms DFL without caching.

Decentralized Federated Learning with Model Caching on Mobile Agents

TL;DR

This work tackles the slow convergence of decentralized federated learning (DFL) in highly mobile networks by introducing Cached-DFL, which injects delay-tolerant, DTN-like model spreading through per-agent caches. Agents exchange both current and cached models upon encounters, and perform local aggregation over the union of cached models, with a convergence guarantee that accounts for model staleness introduced by caching. The approach is validated on vehicular-network-like simulations with MNIST, FashionMNIST, and CIFAR-10 across non-i.i.d, i.i.d, and Dirichlet data distributions, showing faster convergence and performance close to centralized FL, especially under heterogeneity. Group-based caching further enhances performance in grouped mobility scenarios, underscoring Cached-DFL’s practicality for real-world V2V and edge environments where traditional DFL struggles due to sporadic connectivity and data non-i.i.d.-ness.

Abstract

Federated Learning (FL) trains a shared model using data and computation power on distributed agents coordinated by a central server. Decentralized FL (DFL) utilizes local model exchange and aggregation between agents to reduce the communication and computation overheads on the central server. However, when agents are mobile, the communication opportunity between agents can be sporadic, largely hindering the convergence and accuracy of DFL. In this paper, we propose Cached Decentralized Federated Learning (Cached-DFL) to investigate delay-tolerant model spreading and aggregation enabled by model caching on mobile agents. Each agent stores not only its own model, but also models of agents encountered in the recent past. When two agents meet, they exchange their own models as well as the cached models. Local model aggregation utilizes all models stored in the cache. We theoretically analyze the convergence of Cached-DFL, explicitly taking into account the model staleness introduced by caching. We design and compare different model caching algorithms for different DFL and mobility scenarios. We conduct detailed case studies in a vehicular network to systematically investigate the interplay between agent mobility, cache staleness, and model convergence. In our experiments, Cached-DFL converges quickly, and significantly outperforms DFL without caching.
Paper Structure (32 sections, 2 theorems, 27 equations, 15 figures, 6 tables, 3 algorithms)

This paper contains 32 sections, 2 theorems, 27 equations, 15 figures, 6 tables, 3 algorithms.

Key Result

Theorem 4

Assume that $F$ is $L$-smooth and convex, and each agent executes $K$ local updates before meeting and exchanging models, after that, then does model aggregation. We also assume bounded staleness $\tau < \tau_{max}$, as the kick-out threshold. Furthermore, we assume, $\forall x\in \mathbb{R}^d, i \i

Figures (15)

  • Figure 1: Manhattan Mobility Model Map. The dots represent the intersections while the edges between nodes represent road in Manhattan.
  • Figure 2: DFL with Caching vs. DFL without Caching.
  • Figure 3: DFL with LRU at Different Cache Sizes.
  • Figure 4: Impact of $\tau_{max}$ on Model Convergence.
  • Figure 5: Convergence at Different Mobility Speed.
  • ...and 10 more figures

Theorems & Definitions (11)

  • Remark 1
  • Remark 2
  • Remark 3
  • Definition 1: Smoothness
  • Definition 2: Bounded Variance
  • Definition 3: L-Lipschitz Continuous Gradient
  • Theorem 4
  • Definition 4: Smoothness
  • Definition 5: Bounded Variance
  • Definition 6: L-Lipschitz Continuous Gradient
  • ...and 1 more