Towards General Purpose Robots at Scale: Lifelong Learning and Learning to Use Memory

William Yue

Towards General Purpose Robots at Scale: Lifelong Learning and Learning to Use Memory

William Yue

TL;DR

The paper tackles the challenge of deploying general-purpose robots that operate over long time horizons by addressing memory and lifelong learning. It introduces t-DGR, a non-autoregressive trajectory-based deep generative replay method that achieves state-of-the-art performance on Continual World benchmarks, and AttentionTuner, a memory-guided learning framework that teaches Transformer-based agents to utilize memory via memory dependency pairs annotated by humans. Through comprehensive experiments on Continual World and memory-demanding LTMB tasks, the authors demonstrate that t-DGR mitigates catastrophic forgetting and that AttentionTuner improves long-term credit assignment and generalization, even with very sparse annotations. The work underscores the practical viability of integrating scalable continual learning with memory-aware imitation learning to enable robots to learn and operate effectively in unstructured real-world environments. Together, these approaches advance the goal of scalable robot deployment by enabling durable learning and efficient memory use across extended deployments.

Abstract

The widespread success of artificial intelligence in fields like natural language processing and computer vision has not yet fully transferred to robotics, where progress is hindered by the lack of large-scale training data and the complexity of real-world tasks. To address this, many robot learning researchers are pushing to get robots deployed at scale in everyday unstructured environments like our homes to initiate a data flywheel. While current robot learning systems are effective for certain short-horizon tasks, they are not designed to autonomously operate over long time horizons in unstructured environments. This thesis focuses on addressing two key challenges for robots operating over long time horizons: memory and lifelong learning. We propose two novel methods to advance these capabilities. First, we introduce t-DGR, a trajectory-based deep generative replay method that achieves state-of-the-art performance on Continual World benchmarks, advancing lifelong learning. Second, we develop a framework that leverages human demonstrations to teach agents effective memory utilization, improving learning efficiency and success rates on Memory Gym tasks. Finally, we discuss future directions for achieving the lifelong learning and memory capabilities necessary for robots to function at scale in real-world settings.

Towards General Purpose Robots at Scale: Lifelong Learning and Learning to Use Memory

TL;DR

Abstract

Paper Structure (97 sections, 14 equations, 19 figures, 11 tables, 2 algorithms)

This paper contains 97 sections, 14 equations, 19 figures, 11 tables, 2 algorithms.

Introduction
Lifelong Learning
Introduction
Related Work
Continual Learning in the Real World
Continual Learning Methods
Regularization
Architecture-based Methods
Pseudo-rehearsal Methods
Background
Imitation Learning
Continual Imitation Learning
Diffusion Probabilistic Models
Notation
Method
...and 82 more sections

Figures (19)

Figure 1: A humanoid operating in a living room.
Figure 2: The data flywheel for robotics.
Figure 3: Roomba robot vacuuming the carpet of a residential game room.
Figure 4: The first row presents a comparison of three generative methods for imitating an agent's movement in a continuous 2D plane with Gaussian noise. The objective is to replicate the ground truth path, which transitions from darker to lighter colors. The autoregressive method (CRIL) encounters a challenge at the first sharp turn as nearby points move in opposing directions. Once the autoregressive method deviates off course, it never recovers and compromises the remaining trajectory. In contrast, sampling individual state observations i.i.d. without considering the temporal nature of trajectories (DGR) leads to a fragmented path with numerous gaps. Our proposed method t-DGR samples individual state observations conditioned on the trajectory timestep. By doing so, t-DGR successfully avoids the pitfalls of CRIL and DGR, ensuring a more accurate replication of the desired trajectory. The second row illustrates how each method generates trajectory data. CRIL generates the next state observation conditioned on the previous state observation. DGR, in contrast, does not attempt to generate a trajectory but generates individual state observations i.i.d. On the other hand, t-DGR generates state observations conditioned on the trajectory timestep.
Figure 5: The deep generative replay paradigm. The algorithm learns to generate trajectories from past tasks to augment real trajectories from the current task in order to mitigate catastrophic forgetting. Both the generator and policy model are updated with this augmented dataset.
...and 14 more figures

Towards General Purpose Robots at Scale: Lifelong Learning and Learning to Use Memory

TL;DR

Abstract

Towards General Purpose Robots at Scale: Lifelong Learning and Learning to Use Memory

Authors

TL;DR

Abstract

Table of Contents

Figures (19)