VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference

Soumya Banerjee; Vinay K. Verma; Avideep Mukherjee; Deepak Gupta; Vinay P. Namboodiri; Piyush Rai

VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference

Soumya Banerjee, Vinay K. Verma, Avideep Mukherjee, Deepak Gupta, Vinay P. Namboodiri, Piyush Rai

TL;DR

This work proposes a novel virtual gradients based approach for continual representation learning which adapts to each new example while also generalizing well on past data to prevent catastrophic forgetting.

Abstract

Lifelong learning or continual learning is the problem of training an AI agent continuously while also preventing it from forgetting its previously acquired knowledge. Streaming lifelong learning is a challenging setting of lifelong learning with the goal of continuous learning in a dynamic non-stationary environment without forgetting. We introduce a novel approach to lifelong learning, which is streaming (observes each training example only once), requires a single pass over the data, can learn in a class-incremental manner, and can be evaluated on-the-fly (anytime inference). To accomplish these, we propose a novel \emph{virtual gradients} based approach for continual representation learning which adapts to each new example while also generalizing well on past data to prevent catastrophic forgetting. Our approach also leverages an exponential-moving-average-based semantic memory to further enhance performance. Experiments on diverse datasets with temporally correlated observations demonstrate our method's efficacy and superior performance over existing methods.

VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference

TL;DR

Abstract

Paper Structure (13 sections, 6 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 6 equations, 8 figures, 3 tables, 1 algorithm.

INTRODUCTION
RELATED WORK
Proposed Approach
Virtual Gradient as Regularizer
Tiny Episodic Memory (TEM)
Semantic Memory (SEM)
Experiments
Datasets, Data Orderings and Metrics
Baselines and Compared Methods
Implementation Details
Results
Ablations
Conclusion

Figures (8)

Figure 1: SLL involves continuous learning from non-i.i.d. labeled streams with multiple views without forgetting. This fig. shows temporally ordered cup frames from CoRe50 lomonaco2017core50.
Figure 2: In VERSE, Virtual-gradient-regularization (VGR) enables CL by adapting to new sample(s) with a virtual model $(\theta^{v})$, which computes the final model $(\theta)$ through rehearsal. Episodic memory (TEM) stores a few observed samples, while Semantic memory (SEM) enforces consistency with self-distillation loss, improving overall performance.
Figure 3: Plots of $\boldsymbol{\alpha}_{t}$ as a function of streaming learning model and data-orderings. VERSE outperforms other SLL models in both streaming class-iid (top-row) and streaming class-instance (bottom-row) orderings across datasets.
Figure 4: Performance ($\boldsymbol{\mu}_{\text{all}}$) comparison between VERSE (Ours) and the other baselines on ImageNet100.
Figure 5: Performance ($\boldsymbol{\mu}_{\text{all}}$) comparison between VERSE and Lifelong MAML gupta2020look on iCub1.0.
...and 3 more figures

VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference

TL;DR

Abstract

VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (8)