Streaming Federated Learning with Markovian Data
Tan-Khiem Huynh, Malcolm Egan, Giovanni Neglia, Jean-Marie Gorce
TL;DR
The paper develops convergence results for Streaming Federated Learning when client data streams are non-stationary Markov processes. It analyzes Minibatch SGD, Local SGD, and Local SGD with momentum under smooth non-convex objectives, showing that increasing the number of clients yields a linear speed-up in sample complexity, with the overall complexity increased relative to i.i.d. data due to Markovian dependencies. A key insight is that gradient noise is amplified by a factor determined by the Markov chain's spectral properties, and some gradient bias cannot be eliminated by stepsize alone, motivating sufficient local computation or momentum-based methods to control drift. Empirical validation on environmental-monitoring data demonstrates the practical benefit of FL collaboration in the presence of dependent data, while underscoring the trade-offs introduced by Markovian streams. Overall, the work extends the theoretical foundations of FL to streaming, dependent data settings and provides guidance on algorithm choice and parameter tuning in such regimes.
Abstract
Federated learning (FL) is now recognized as a key framework for communication-efficient collaborative learning. Most theoretical and empirical studies, however, rely on the assumption that clients have access to pre-collected data sets, with limited investigation into scenarios where clients continuously collect data. In many real-world applications, particularly when data is generated by physical or biological processes, client data streams are often modeled by non-stationary Markov processes. Unlike standard i.i.d. sampling, the performance of FL with Markovian data streams remains poorly understood due to the statistical dependencies between client samples over time. In this paper, we investigate whether FL can still support collaborative learning with Markovian data streams. Specifically, we analyze the performance of Minibatch SGD, Local SGD, and a variant of Local SGD with momentum. We answer affirmatively under standard assumptions and smooth non-convex client objectives: the sample complexity is proportional to the inverse of the number of clients with a communication complexity comparable to the i.i.d. scenario. However, the sample complexity for Markovian data streams remains higher than for i.i.d. sampling.
