Table of Contents
Fetching ...

Sparse Uncertainty-Informed Sampling from Federated Streaming Data

Manuel Röder, Frank-Michael Schleif

TL;DR

The paper addresses selective labeling in federated streaming data with non-IID distributions and limited labeling budgets. It introduces a volume-sampling based decision rule operating on penultimate-layer representations $tau_i(x_t) = A_i(f_i(x_t))$ and a probability $p_t$ that depends on a tracking covariance inverse. To ensure numerical stability on resource-constrained devices, the method uses a Cholesky-based low-rank update instead of Woodbury, updating the inverse covariance efficiently. Experiments show improved training batch diversity, robust numerical stability, and competitive runtime, demonstrating practical applicability for federated streaming and on-device learning, with a publicly available codebase.

Abstract

We present a numerically robust, computationally efficient approach for non-I.I.D. data stream sampling in federated client systems, where resources are limited and labeled data for local model adaptation is sparse and expensive. The proposed method identifies relevant stream observations to optimize the underlying client model, given a local labeling budget, and performs instantaneous labeling decisions without relying on any memory buffering strategies. Our experiments show enhanced training batch diversity and an improved numerical robustness of the proposal compared to existing strategies over large-scale data streams, making our approach an effective and convenient solution in FL environments.

Sparse Uncertainty-Informed Sampling from Federated Streaming Data

TL;DR

The paper addresses selective labeling in federated streaming data with non-IID distributions and limited labeling budgets. It introduces a volume-sampling based decision rule operating on penultimate-layer representations and a probability that depends on a tracking covariance inverse. To ensure numerical stability on resource-constrained devices, the method uses a Cholesky-based low-rank update instead of Woodbury, updating the inverse covariance efficiently. Experiments show improved training batch diversity, robust numerical stability, and competitive runtime, demonstrating practical applicability for federated streaming and on-device learning, with a publicly available codebase.

Abstract

We present a numerically robust, computationally efficient approach for non-I.I.D. data stream sampling in federated client systems, where resources are limited and labeled data for local model adaptation is sparse and expensive. The proposed method identifies relevant stream observations to optimize the underlying client model, given a local labeling budget, and performs instantaneous labeling decisions without relying on any memory buffering strategies. Our experiments show enhanced training batch diversity and an improved numerical robustness of the proposal compared to existing strategies over large-scale data streams, making our approach an effective and convenient solution in FL environments.
Paper Structure (6 sections, 3 equations, 4 figures, 1 algorithm)

This paper contains 6 sections, 3 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: Sparse sample selection from federated streaming data, examplified by conveyor belt object scanning on client $i$: the orange, dashed data flow illustrates the decision-making algorithm; the red data flow depicts the fine-tuning pipeline.
  • Figure 2: Relative matrix reconstruction error comparison.
  • Figure 3: Algorithmic complexity and average wall-clock runtime comparison. Results calculated over three runs.
  • Figure 4: Model decision regions.