Sparse Uncertainty-Informed Sampling from Federated Streaming Data
Manuel Röder, Frank-Michael Schleif
TL;DR
The paper addresses selective labeling in federated streaming data with non-IID distributions and limited labeling budgets. It introduces a volume-sampling based decision rule operating on penultimate-layer representations $tau_i(x_t) = A_i(f_i(x_t))$ and a probability $p_t$ that depends on a tracking covariance inverse. To ensure numerical stability on resource-constrained devices, the method uses a Cholesky-based low-rank update instead of Woodbury, updating the inverse covariance efficiently. Experiments show improved training batch diversity, robust numerical stability, and competitive runtime, demonstrating practical applicability for federated streaming and on-device learning, with a publicly available codebase.
Abstract
We present a numerically robust, computationally efficient approach for non-I.I.D. data stream sampling in federated client systems, where resources are limited and labeled data for local model adaptation is sparse and expensive. The proposed method identifies relevant stream observations to optimize the underlying client model, given a local labeling budget, and performs instantaneous labeling decisions without relying on any memory buffering strategies. Our experiments show enhanced training batch diversity and an improved numerical robustness of the proposal compared to existing strategies over large-scale data streams, making our approach an effective and convenient solution in FL environments.
