Table of Contents
Fetching ...

t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making

William Yue, Bo Liu, Peter Stone

TL;DR

This paper proposes a simple, scalable, and non-autoregressive method for continual learning in decision-making tasks using a generative model that generates task samples conditioned on the trajectory timestep and finds that this approach achieves state-of-the-art performance on the average success rate metric among continual learning methods.

Abstract

Deep generative replay has emerged as a promising approach for continual learning in decision-making tasks. This approach addresses the problem of catastrophic forgetting by leveraging the generation of trajectories from previously encountered tasks to augment the current dataset. However, existing deep generative replay methods for continual learning rely on autoregressive models, which suffer from compounding errors in the generated trajectories. In this paper, we propose a simple, scalable, and non-autoregressive method for continual learning in decision-making tasks using a generative model that generates task samples conditioned on the trajectory timestep. We evaluate our method on Continual World benchmarks and find that our approach achieves state-of-the-art performance on the average success rate metric among continual learning methods. Code is available at https://github.com/WilliamYue37/t-DGR.

t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making

TL;DR

This paper proposes a simple, scalable, and non-autoregressive method for continual learning in decision-making tasks using a generative model that generates task samples conditioned on the trajectory timestep and finds that this approach achieves state-of-the-art performance on the average success rate metric among continual learning methods.

Abstract

Deep generative replay has emerged as a promising approach for continual learning in decision-making tasks. This approach addresses the problem of catastrophic forgetting by leveraging the generation of trajectories from previously encountered tasks to augment the current dataset. However, existing deep generative replay methods for continual learning rely on autoregressive models, which suffer from compounding errors in the generated trajectories. In this paper, we propose a simple, scalable, and non-autoregressive method for continual learning in decision-making tasks using a generative model that generates task samples conditioned on the trajectory timestep. We evaluate our method on Continual World benchmarks and find that our approach achieves state-of-the-art performance on the average success rate metric among continual learning methods. Code is available at https://github.com/WilliamYue37/t-DGR.
Paper Structure (37 sections, 8 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 37 sections, 8 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: The first row presents a comparison of three generative methods for imitating an agent's movement in a continuous 2D plane with Gaussian noise. The objective is to replicate the ground truth path, which transitions from darker to lighter colors. The autoregressive method (CRIL) encounters a challenge at the first sharp turn as nearby points move in opposing directions. Once the autoregressive method deviates off course, it never recovers and compromises the remaining trajectory. In contrast, sampling individual state observations i.i.d. without considering the temporal nature of trajectories (DGR) leads to a fragmented path with numerous gaps. Our proposed method t-DGR samples individual state observations conditioned on the trajectory timestep. By doing so, t-DGR successfully avoids the pitfalls of CRIL and DGR, ensuring a more accurate replication of the desired trajectory. The second row illustrates how each method generates trajectory data. CRIL generates the next state observation conditioned on the previous state observation. DGR, in contrast, does not attempt to generate a trajectory but generates individual state observations i.i.d. On the other hand, t-DGR generates state observations conditioned on the trajectory timestep.
  • Figure 2: The deep generative replay paradigm. The algorithm learns to generate trajectories from past tasks to augment real trajectories from the current task in order to mitigate catastrophic forgetting. Both the generator and policy model are updated with this augmented dataset.
  • Figure 3: This table illustrates the ability of the diffusion model in t-DGR to generate past data as it continues to learn additional tasks in CW10 through generative replay. The line plot for task $i$ plots the average diffusion loss of the diffusion model in future tasks on task $i$ data. The loss is an L1 version of the diffusion training loss in Equation \ref{['eq:diffLoss']}.