t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making

William Yue; Bo Liu; Peter Stone

t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making

William Yue, Bo Liu, Peter Stone

TL;DR

This paper proposes a simple, scalable, and non-autoregressive method for continual learning in decision-making tasks using a generative model that generates task samples conditioned on the trajectory timestep and finds that this approach achieves state-of-the-art performance on the average success rate metric among continual learning methods.

Abstract

Deep generative replay has emerged as a promising approach for continual learning in decision-making tasks. This approach addresses the problem of catastrophic forgetting by leveraging the generation of trajectories from previously encountered tasks to augment the current dataset. However, existing deep generative replay methods for continual learning rely on autoregressive models, which suffer from compounding errors in the generated trajectories. In this paper, we propose a simple, scalable, and non-autoregressive method for continual learning in decision-making tasks using a generative model that generates task samples conditioned on the trajectory timestep. We evaluate our method on Continual World benchmarks and find that our approach achieves state-of-the-art performance on the average success rate metric among continual learning methods. Code is available at https://github.com/WilliamYue37/t-DGR.

t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making

TL;DR

Abstract

Paper Structure (37 sections, 8 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 37 sections, 8 equations, 3 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Continual Learning in the Real World
Continual Learning Methods
Regularization
Architecture-based Methods
Pseudo-rehearsal Methods
Background
Imitation Learning
Continual Imitation Learning
Diffusion Probabilistic Models
Notation
Method
Architecture
Experiments
...and 22 more sections

Figures (3)

Figure 1: The first row presents a comparison of three generative methods for imitating an agent's movement in a continuous 2D plane with Gaussian noise. The objective is to replicate the ground truth path, which transitions from darker to lighter colors. The autoregressive method (CRIL) encounters a challenge at the first sharp turn as nearby points move in opposing directions. Once the autoregressive method deviates off course, it never recovers and compromises the remaining trajectory. In contrast, sampling individual state observations i.i.d. without considering the temporal nature of trajectories (DGR) leads to a fragmented path with numerous gaps. Our proposed method t-DGR samples individual state observations conditioned on the trajectory timestep. By doing so, t-DGR successfully avoids the pitfalls of CRIL and DGR, ensuring a more accurate replication of the desired trajectory. The second row illustrates how each method generates trajectory data. CRIL generates the next state observation conditioned on the previous state observation. DGR, in contrast, does not attempt to generate a trajectory but generates individual state observations i.i.d. On the other hand, t-DGR generates state observations conditioned on the trajectory timestep.
Figure 2: The deep generative replay paradigm. The algorithm learns to generate trajectories from past tasks to augment real trajectories from the current task in order to mitigate catastrophic forgetting. Both the generator and policy model are updated with this augmented dataset.
Figure 3: This table illustrates the ability of the diffusion model in t-DGR to generate past data as it continues to learn additional tasks in CW10 through generative replay. The line plot for task $i$ plots the average diffusion loss of the diffusion model in future tasks on task $i$ data. The loss is an L1 version of the diffusion training loss in Equation \ref{['eq:diffLoss']}.

t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making

TL;DR

Abstract

t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making

Authors

TL;DR

Abstract

Table of Contents

Figures (3)