MODL: Multilearner Online Deep Learning

Antonios Valkanas; Boris N. Oreshkin; Mark Coates

MODL: Multilearner Online Deep Learning

Antonios Valkanas, Boris N. Oreshkin, Mark Coates

TL;DR

MODL tackles online learning with streaming data and missing features by proposing a multilearner stacking framework that blends a fast online logistic regression with slower deep components, including a ProtoRes-based set learner. The core idea is to aggregate latent scores from diverse learners via a summation and a final softmax to produce predictions, avoiding hedge backpropagation and achieving faster convergence than prior online deep-learning methods. Empirical results on eight benchmarks show significant improvements in accuracy and training efficiency, with large reductions in per-step computation from $O(nL^2)$ to $O(nL)$ and robust performance across varying feature availability. The work demonstrates practical impact for real-time systems by delivering scalable online deep learning with strong empirical performance and accessible code.

Abstract

Online deep learning tackles the challenge of learning from data streams by balancing two competing goals: fast learning and deep learning. However, existing research primarily emphasizes deep learning solutions, which are more adept at handling the ``deep'' aspect than the ``fast'' aspect of online learning. In this work, we introduce an alternative paradigm through a hybrid multilearner approach. We begin by developing a fast online logistic regression learner, which operates without relying on backpropagation. It leverages closed-form recursive updates of model parameters, efficiently addressing the fast learning component of the online learning challenge. This approach is further integrated with a cascaded multilearner design, where shallow and deep learners are co-trained in a cooperative, synergistic manner to solve the online learning problem. We demonstrate that this approach achieves state-of-the-art performance on standard online learning datasets. We make our code available: https://github.com/AntonValk/MODL

MODL: Multilearner Online Deep Learning

TL;DR

and robust performance across varying feature availability. The work demonstrates practical impact for real-time systems by delivering scalable online deep learning with strong empirical performance and accessible code.

Abstract

Paper Structure (39 sections, 2 theorems, 64 equations, 6 figures, 13 tables, 3 algorithms)

This paper contains 39 sections, 2 theorems, 64 equations, 6 figures, 13 tables, 3 algorithms.

INTRODUCTION
RELATED WORK
PROBLEM STATEMENT
METHODOLOGY
Multilearner Online Deep Learning
EXPERIMENTS
CONCLUSION
MODL: Multilearner Online Deep LearningSupplementary material
Broader Impact Statement
Additional Background
Model weighing approaches
Online Learning with Missing Features
Fast Online Deep Learning
Algorithms
Additional Experiments
...and 24 more sections

Key Result

Proposition 1

Assuming an input feature distribution for $\mathbf{x}$ that is approximately normal, and linearizing the non-linear relationship, $\mathbf{y} = \sigma(\theta \mathbf{x})$, a quadratic approximation to the posterior of model weights after observing the $n$-th datapoint is given by the recursive form

Figures (6)

Figure 1: Multilearner Online Deep Learning. The dataset is streamed sequentially. Fast learners quickly adapt to the data distribution, providing a strong baseline for the deeper models. By synergizing models with different bias-variance trade-offs, the overall architecture quickly adapts to the data and learns deep representations. Individual model latent class scores $\widetilde{p}_i$ are sum pooled, then projected into class probabilities. During co-learning models learn to predict on top of each other. Non-neural learners (green box) learn via filtering style updates; $f(y-\widetilde{p}_1)$ is \ref{['eq:update2']}. Neural learners (orange box) learn via backpropagation of cross entropy loss $\mathop{\mathrm{\mathcal{L}}}\nolimits_{\text{CE}}$.
Figure 2: Comparison of missclassification rate (lower is better) as a function of online training step for Aux-Drop (ODL) agarwal2023auxdrop (blue) vs. our proposed model MODL (red). Shaded regions indicate 95% C.I.
Figure 3: Exact log-likelihood vs. our proposed quadratic approximation. We see that close to the data-generating parameters (marked as x) both our approximation and the exact log-likelihood function agree. See \ref{['app:toy_exp']} for toy dataset experiment details.
Figure 4: Visualization of our set learning decoder.
Figure 5: ODL hedge backpropagation. Red lines indicate individual backpropagation calculations.
...and 1 more figures

Theorems & Definitions (3)

Proposition 1
Proposition 2
proof

MODL: Multilearner Online Deep Learning

TL;DR

Abstract

MODL: Multilearner Online Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (3)