Tracking Changing Probabilities via Dynamic Learners
Omid Madani
TL;DR
This paper tackles online probabilistic multiclass prediction over unbounded streams with strict memory limits, addressing external and internal nonstationarity by separating salient predictions from noise. It introduces Sparse Moving Averages (SMAs), notably the sparse EMA and a queue-based Qs predictor, and proposes DYAL, a hybrid that dynamically blends per-predictand EMA learning with queue-based switching to adapt rapidly to changes. The evaluation framework uses bounded log-loss for NS items and proper scoring principles to compare open-ended predictors under nonstationarity, demonstrating that per-predictand learning rates and a DYAL combination yield faster adaptation and lower variance than single-rate EMA or queue methods in many regimes. The findings have practical impact for lifelong, continual learning systems and real-world data streams where concepts emerge and evolve, enabling robust probability estimates for salient items while maintaining bounded memory usage.
Abstract
Consider a predictor, a learner, whose input is a stream of discrete items. The predictor's task, at every time point, is probabilistic multiclass prediction, i.e. to predict which item may occur next by outputting zero or more candidate items, each with a probability, after which the actual item is revealed and the predictor updates. To output probabilities, the predictor keeps track of the proportions of the items it has seen. The stream is unbounded (lifelong), and the predictor has finite limited space. The task is open-ended: the set of items is unknown to the predictor and their totality can also grow unbounded. Moreover, there is non-stationarity: the underlying frequencies of items may change, substantially, from time to time. For instance, new items may start appearing and a few recently frequent items may cease to occur again. The predictor, being space-bounded, need only provide probabilities for those items which, at the time of prediction, have sufficiently high frequency, i.e., the salient items. This problem is motivated in the setting of Prediction Games, a self-supervised learning regime where concepts serve as both the predictors and the predictands, and the set of concepts grows over time, resulting in non-stationarities as new concepts are generated and used. We design and study a number of predictors, sparse moving averages(SMAs), for the task. One SMA adapts the sparse exponentiated moving average and another is based on queuing a few counts, keeping dynamic per-item histories. Evaluating the predicted probabilities, under noise and non-stationarity, presents challenges, and we discuss and develop evaluation methods, one based on bounding log-loss. We show that a combination of ideas, supporting dynamic predictand-specific learning rates, offers advantages in terms of faster adaption to change (plasticity), while also supporting low variance (stability).
