Table of Contents
Fetching ...

Kalman Filter for Online Classification of Non-Stationary Data

Michalis K. Titsias, Alexandre Galashov, Amal Rannen-Triki, Razvan Pascanu, Yee Whye Teh, Jorg Bornschein

TL;DR

A probabilistic Bayesian online learning model is introduced by using a (possibly pretrained) neural representation and a state space model over the linear predictor weights, modelled using a parameter drift transition density and parametrized by a coefficient that quantifies forgetting.

Abstract

In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps. Important challenges in OCL are concerned with automatic adaptation to the particular non-stationary structure of the data, and with quantification of predictive uncertainty. Motivated by these challenges we introduce a probabilistic Bayesian online learning model by using a (possibly pretrained) neural representation and a state space model over the linear predictor weights. Non-stationarity over the linear predictor weights is modelled using a parameter drift transition density, parametrized by a coefficient that quantifies forgetting. Inference in the model is implemented with efficient Kalman filter recursions which track the posterior distribution over the linear weights, while online SGD updates over the transition dynamics coefficient allows to adapt to the non-stationarity seen in data. While the framework is developed assuming a linear Gaussian model, we also extend it to deal with classification problems and for fine-tuning the deep learning representation. In a set of experiments in multi-class classification using data sets such as CIFAR-100 and CLOC we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.

Kalman Filter for Online Classification of Non-Stationary Data

TL;DR

A probabilistic Bayesian online learning model is introduced by using a (possibly pretrained) neural representation and a state space model over the linear predictor weights, modelled using a parameter drift transition density and parametrized by a coefficient that quantifies forgetting.

Abstract

In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps. Important challenges in OCL are concerned with automatic adaptation to the particular non-stationary structure of the data, and with quantification of predictive uncertainty. Motivated by these challenges we introduce a probabilistic Bayesian online learning model by using a (possibly pretrained) neural representation and a state space model over the linear predictor weights. Non-stationarity over the linear predictor weights is modelled using a parameter drift transition density, parametrized by a coefficient that quantifies forgetting. Inference in the model is implemented with efficient Kalman filter recursions which track the posterior distribution over the linear weights, while online SGD updates over the transition dynamics coefficient allows to adapt to the non-stationarity seen in data. While the framework is developed assuming a linear Gaussian model, we also extend it to deal with classification problems and for fine-tuning the deep learning representation. In a set of experiments in multi-class classification using data sets such as CIFAR-100 and CLOC we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.
Paper Structure (29 sections, 16 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 29 sections, 16 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Results in a artificial time series example of $3058$ observations. Top row in left panel shows the data (black dots) and the predicted mean and uncertainty (orange lines) over $y_n$ (as data arrive sequentially from left to right and we perform online next step prediction), while the bottom row shows the optimized values of $\gamma_n^2 = \exp(-\delta_n)$. Right panel shows the accumulated average log predictive density, i.e. $\frac{1}{n} \sum_{i=1}^n \log p(y_i | y_{1:i-1})$, computed across time for the model that learns $\gamma_n$ and the model that ignores non-stationarity by setting $\gamma_n=1$ for all $n$.
  • Figure 2: Non-stationary CIFAR-100. The left plot shows the evolution of $\gamma$, the right plot shows the corresponding average online accuracy. The red dashed lines correspond to the task boundaries.
  • Figure 3: CLOC results. On the left plot we present the results when we start learning from scratch. On the right plot we present results when we start from pretrained model. We also report results for external baselines taken from ghunaim2023real: ER and ACE caccia2022new. Note that on the left, the top two curves that are on top of each other are Online SGD with replay and Kalman filter with finetuned backbone, while the Kalman Filter outperforms considerably on right plot.
  • Figure 4: Online prediction on the artificial time series example by applying the Kalman filter model with fixed $\gamma_n=1$.
  • Figure 5: CLOC log probabilitites for Kalman filter with finetuned backbone and finetuned delta. We show the data (black dots) and the predicted mean and uncertainty (orange lines) over $y_n$ (as data arrive sequentially from left to right and we perform online next time step prediction).
  • ...and 2 more figures