An Algorithm-Centered Approach To Model Streaming Data
Fabian Hinder, Valerie Vaquet, David Komnick, Barbara Hammer
TL;DR
The paper introduces a window-centered framework for streaming data with concept drift, reframing drift through windowed distributions $\mathcal{D}_W$ and window systems (WS) that align with sliding-window algorithms. It proves deep theoretical connections between window-based models and traditional time-point distribution processes, showing that drift can be captured by non-constant WDSs with a compatible time distribution, and that under finite horizons the two viewpoints are effectively equivalent. An algorithmic approach is proposed to recover the time marginal $P_T$ and time-varying distribution $\mathcal{D}_t$ from observed window statistics via a coordinate-descent optimization, with practical validation through reconstruction experiments and a water-distribution network case study. The work provides constructive proofs, bridges theory and algorithm design, and points toward extending the framework to infinite horizons, enabling statistically principled analysis of truly streaming data.
Abstract
Besides the classical offline setup of machine learning, stream learning constitutes a well-established setup where data arrives over time in potentially non-stationary environments. Concept drift, the phenomenon that the underlying distribution changes over time poses a significant challenge. Yet, despite high practical relevance, there is little to no foundational theory for learning in the drifting setup comparable to classical statistical learning theory in the offline setting. This can be attributed to the lack of an underlying object comparable to a probability distribution as in the classical setup. While there exist approaches to transfer ideas to the streaming setup, these start from a data perspective rather than an algorithmic one. In this work, we suggest a new model of data over time that is aimed at the algorithm's perspective. Instead of defining the setup using time points, we utilize a window-based approach that resembles the inner workings of most stream learning algorithms. We compare our framework to others from the literature on a theoretical basis, showing that in many cases both model the same situation. Furthermore, we perform a numerical evaluation and showcase an application in the domain of critical infrastructure.
