Table of Contents
Fetching ...

Don't Look Back in Anger: MAGIC Net for Streaming Continual Learning with Temporal Dependence

Federico Giannini, Sandro D'Andrea, Emanuele Della Valle

Abstract

Concept drift, temporal dependence, and catastrophic forgetting represent major challenges when learning from data streams. While Streaming Machine Learning and Continual Learning (CL) address these issues separately, recent efforts in Streaming Continual Learning (SCL) aim to unify them. In this work, we introduce MAGIC Net, a novel SCL approach that integrates CL-inspired architectural strategies with recurrent neural networks to tame temporal dependence. MAGIC Net continuously learns, looks back at past knowledge by applying learnable masks over frozen weights, and expands its architecture when necessary. It performs all operations online, ensuring inference availability at all times. Experiments on synthetic and real-world streams show that it improves adaptation to new concepts, limits memory usage, and mitigates forgetting.

Don't Look Back in Anger: MAGIC Net for Streaming Continual Learning with Temporal Dependence

Abstract

Concept drift, temporal dependence, and catastrophic forgetting represent major challenges when learning from data streams. While Streaming Machine Learning and Continual Learning (CL) address these issues separately, recent efforts in Streaming Continual Learning (SCL) aim to unify them. In this work, we introduce MAGIC Net, a novel SCL approach that integrates CL-inspired architectural strategies with recurrent neural networks to tame temporal dependence. MAGIC Net continuously learns, looks back at past knowledge by applying learnable masks over frozen weights, and expands its architecture when necessary. It performs all operations online, ensuring inference availability at all times. Experiments on synthetic and real-world streams show that it improves adaptation to new concepts, limits memory usage, and mitigates forgetting.
Paper Structure (8 sections, 9 equations, 3 figures, 2 tables)

This paper contains 8 sections, 9 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Forward step on the feature vector $X_t$ during the ensemble phase. For simplicity, only one Mask option is shown to represent both MaskFineTune and MaskRandom. They build weights (GRU$^M$, OUT$^M$) by multiplying the frozen cGRU weights by masks in $(0,1)$. Masks are obtained by applying a sigmoid to learnable values. Expand expands the GRU layer with new learnable weights. It applies learnable masks to the frozen cGRU weights, and frozen masks equal to 1 to the new learnable weights. It obtains GRU$^E$ and OUT$^E$. The opacity of the masked weights increases as the mask approaches 1.
  • Figure 2: Average Cohen's Kappa over time on 50 configurations of real data, using a concept drift detector with 100% precision and recall. Scores are first averaged per concept, then across configurations. MAGIC Net outperforms others across the stream. On PowerConsumption and Weather, cPNN starts slightly worse. MAGIC Net is clearly best at concept ends.
  • Figure 3: Average sizes in MB on the 50 configurations of the real data sources over the detections with 70% precision and 100% recall. MAGIC Net reduces the size as new detections arise by deciding when to expand the architecture. H3 is proven.