Table of Contents
Fetching ...

Streaming Federated Learning with Markovian Data

Tan-Khiem Huynh, Malcolm Egan, Giovanni Neglia, Jean-Marie Gorce

TL;DR

The paper develops convergence results for Streaming Federated Learning when client data streams are non-stationary Markov processes. It analyzes Minibatch SGD, Local SGD, and Local SGD with momentum under smooth non-convex objectives, showing that increasing the number of clients yields a linear speed-up in sample complexity, with the overall complexity increased relative to i.i.d. data due to Markovian dependencies. A key insight is that gradient noise is amplified by a factor determined by the Markov chain's spectral properties, and some gradient bias cannot be eliminated by stepsize alone, motivating sufficient local computation or momentum-based methods to control drift. Empirical validation on environmental-monitoring data demonstrates the practical benefit of FL collaboration in the presence of dependent data, while underscoring the trade-offs introduced by Markovian streams. Overall, the work extends the theoretical foundations of FL to streaming, dependent data settings and provides guidance on algorithm choice and parameter tuning in such regimes.

Abstract

Federated learning (FL) is now recognized as a key framework for communication-efficient collaborative learning. Most theoretical and empirical studies, however, rely on the assumption that clients have access to pre-collected data sets, with limited investigation into scenarios where clients continuously collect data. In many real-world applications, particularly when data is generated by physical or biological processes, client data streams are often modeled by non-stationary Markov processes. Unlike standard i.i.d. sampling, the performance of FL with Markovian data streams remains poorly understood due to the statistical dependencies between client samples over time. In this paper, we investigate whether FL can still support collaborative learning with Markovian data streams. Specifically, we analyze the performance of Minibatch SGD, Local SGD, and a variant of Local SGD with momentum. We answer affirmatively under standard assumptions and smooth non-convex client objectives: the sample complexity is proportional to the inverse of the number of clients with a communication complexity comparable to the i.i.d. scenario. However, the sample complexity for Markovian data streams remains higher than for i.i.d. sampling.

Streaming Federated Learning with Markovian Data

TL;DR

The paper develops convergence results for Streaming Federated Learning when client data streams are non-stationary Markov processes. It analyzes Minibatch SGD, Local SGD, and Local SGD with momentum under smooth non-convex objectives, showing that increasing the number of clients yields a linear speed-up in sample complexity, with the overall complexity increased relative to i.i.d. data due to Markovian dependencies. A key insight is that gradient noise is amplified by a factor determined by the Markov chain's spectral properties, and some gradient bias cannot be eliminated by stepsize alone, motivating sufficient local computation or momentum-based methods to control drift. Empirical validation on environmental-monitoring data demonstrates the practical benefit of FL collaboration in the presence of dependent data, while underscoring the trade-offs introduced by Markovian streams. Overall, the work extends the theoretical foundations of FL to streaming, dependent data settings and provides guidance on algorithm choice and parameter tuning in such regimes.

Abstract

Federated learning (FL) is now recognized as a key framework for communication-efficient collaborative learning. Most theoretical and empirical studies, however, rely on the assumption that clients have access to pre-collected data sets, with limited investigation into scenarios where clients continuously collect data. In many real-world applications, particularly when data is generated by physical or biological processes, client data streams are often modeled by non-stationary Markov processes. Unlike standard i.i.d. sampling, the performance of FL with Markovian data streams remains poorly understood due to the statistical dependencies between client samples over time. In this paper, we investigate whether FL can still support collaborative learning with Markovian data streams. Specifically, we analyze the performance of Minibatch SGD, Local SGD, and a variant of Local SGD with momentum. We answer affirmatively under standard assumptions and smooth non-convex client objectives: the sample complexity is proportional to the inverse of the number of clients with a communication complexity comparable to the i.i.d. scenario. However, the sample complexity for Markovian data streams remains higher than for i.i.d. sampling.

Paper Structure

This paper contains 30 sections, 28 theorems, 143 equations, 9 figures, 2 tables, 3 algorithms.

Key Result

Theorem 4.5

For the problem class $\mathcal{F}_1(L, \sigma, \nu_{ps}, C_{\infty})$, with global step size $\gamma \leq 1/L$, the iterates of Minibatch SGD satisfy:

Figures (9)

  • Figure 1: Gradient norm as a function of the number of communication rounds for Local SGD, Minibatch SGD, Local SGD-M, and SCAFFOLD, with $\gamma = 0.1, \eta = 0.01$, $\beta = 0.5$, $\lambda = 0.01$ for 120 clients (each client has access to 12 consecutive months of training data) and different numbers of local steps.
  • Figure 2: Gradient norm as a function of the number of communication rounds for Local SGD, Minibatch SGD & Local SGD-M, with $\gamma = 0.1, \eta = 0.001$, $\beta = 0.5$, $\lambda = 0.01$ and $K=100$ for different numbers of clients. Each client has access to a window of $6$ consecutive months of training data.
  • Figure 3: Original time series for PM2.5 pollution, temperature, and SO2 pollution. Observe the periodicity in the data indicating seasonality.
  • Figure 4: Time series of PM2.5, temperature, and SO2 with seasonality removed, which is utilized (along with the other pollution and meteorological variables) for the experiments.
  • Figure 5: Auto-correlation of PM2.5, SO2, and temperature time series with seasonality removed. Observe that, particularly for the temperature, there are high levels of correlation at small lags, indicating dependence between adjacent samples.
  • ...and 4 more figures

Theorems & Definitions (53)

  • Theorem 4.5
  • Corollary 4.6
  • Theorem 4.7
  • Corollary 4.8
  • Theorem 4.9
  • Corollary 4.10
  • Definition B.1
  • Definition B.2
  • Proposition B.3: Meyn_Tweedie_Glynn_2009
  • Definition B.4
  • ...and 43 more