Table of Contents
Fetching ...

An Algorithm-Centered Approach To Model Streaming Data

Fabian Hinder, Valerie Vaquet, David Komnick, Barbara Hammer

TL;DR

The paper introduces a window-centered framework for streaming data with concept drift, reframing drift through windowed distributions $\mathcal{D}_W$ and window systems (WS) that align with sliding-window algorithms. It proves deep theoretical connections between window-based models and traditional time-point distribution processes, showing that drift can be captured by non-constant WDSs with a compatible time distribution, and that under finite horizons the two viewpoints are effectively equivalent. An algorithmic approach is proposed to recover the time marginal $P_T$ and time-varying distribution $\mathcal{D}_t$ from observed window statistics via a coordinate-descent optimization, with practical validation through reconstruction experiments and a water-distribution network case study. The work provides constructive proofs, bridges theory and algorithm design, and points toward extending the framework to infinite horizons, enabling statistically principled analysis of truly streaming data.

Abstract

Besides the classical offline setup of machine learning, stream learning constitutes a well-established setup where data arrives over time in potentially non-stationary environments. Concept drift, the phenomenon that the underlying distribution changes over time poses a significant challenge. Yet, despite high practical relevance, there is little to no foundational theory for learning in the drifting setup comparable to classical statistical learning theory in the offline setting. This can be attributed to the lack of an underlying object comparable to a probability distribution as in the classical setup. While there exist approaches to transfer ideas to the streaming setup, these start from a data perspective rather than an algorithmic one. In this work, we suggest a new model of data over time that is aimed at the algorithm's perspective. Instead of defining the setup using time points, we utilize a window-based approach that resembles the inner workings of most stream learning algorithms. We compare our framework to others from the literature on a theoretical basis, showing that in many cases both model the same situation. Furthermore, we perform a numerical evaluation and showcase an application in the domain of critical infrastructure.

An Algorithm-Centered Approach To Model Streaming Data

TL;DR

The paper introduces a window-centered framework for streaming data with concept drift, reframing drift through windowed distributions and window systems (WS) that align with sliding-window algorithms. It proves deep theoretical connections between window-based models and traditional time-point distribution processes, showing that drift can be captured by non-constant WDSs with a compatible time distribution, and that under finite horizons the two viewpoints are effectively equivalent. An algorithmic approach is proposed to recover the time marginal and time-varying distribution from observed window statistics via a coordinate-descent optimization, with practical validation through reconstruction experiments and a water-distribution network case study. The work provides constructive proofs, bridges theory and algorithm design, and points toward extending the framework to infinite horizons, enabling statistically principled analysis of truly streaming data.

Abstract

Besides the classical offline setup of machine learning, stream learning constitutes a well-established setup where data arrives over time in potentially non-stationary environments. Concept drift, the phenomenon that the underlying distribution changes over time poses a significant challenge. Yet, despite high practical relevance, there is little to no foundational theory for learning in the drifting setup comparable to classical statistical learning theory in the offline setting. This can be attributed to the lack of an underlying object comparable to a probability distribution as in the classical setup. While there exist approaches to transfer ideas to the streaming setup, these start from a data perspective rather than an algorithmic one. In this work, we suggest a new model of data over time that is aimed at the algorithm's perspective. Instead of defining the setup using time points, we utilize a window-based approach that resembles the inner workings of most stream learning algorithms. We compare our framework to others from the literature on a theoretical basis, showing that in many cases both model the same situation. Furthermore, we perform a numerical evaluation and showcase an application in the domain of critical infrastructure.

Paper Structure

This paper contains 15 sections, 10 theorems, 44 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

By taking the windows, null windows, and window mean distributions of a holistic distribution we obtain a WDS on finite horizons, i.e., is a well-defined map. It extends via holistic distributions to distribution processes Furthermore, $I(\mathcal{D}_t)$ is not constant if and only if $\mathcal{D}_t$ has drift.

Figures (2)

  • Figure 1: Overview of considered theorems, setups, and stages.
  • Figure 2: Experiment on water data. Figure shows original consumption data (a; unknown), cumulative consumption (b; unknown) and training data (marks in b; known), as well as time point-wise reconstruction and goal prediction (c).

Theorems & Definitions (26)

  • Definition 1: Distribution Process, Time Window, Holistic Distribution oneortwo
  • Definition 2: Window System (WS)
  • Definition 3: Windowed Distribution System (WDS), Constant WDS, Extension
  • Proposition 1
  • proof
  • Definition 4
  • Proposition 2
  • Definition 5: Compatible Time Distribution
  • Proposition 3
  • Corollary 1
  • ...and 16 more