Online Continual Learning For Interactive Instruction Following Agents
Byeonghwi Kim, Minhyuk Seo, Jonghyun Choi
TL;DR
This work introduces two online continual-learning paradigms for embodied agents—Behavior-IL and Environment-IL—to enable continual acquisition of new behaviors and environments in interactive instruction following. It presents Confidence-Aware Moving Average (CAMA), a task-free logit-update mechanism that dynamically weights past and current logits based on predicted confidence, mitigating outdated knowledge without requiring task boundaries. Through extensive experiments on the ALFRED benchmark, CAMA consistently surpasses state-of-the-art baselines (replay, regularization, and distillation methods) on both seen and unseen environments, demonstrating robust continual adaptation with language understanding and object localization. The approach offers a practical path toward real-world embodied agents that learn continuously from streaming experiences while preserving previously learned capabilities.
Abstract
In learning an embodied agent executing daily tasks via language directives, the literature largely assumes that the agent learns all training data at the beginning. We argue that such a learning scenario is less realistic since a robotic agent is supposed to learn the world continuously as it explores and perceives it. To take a step towards a more realistic embodied agent learning scenario, we propose two continual learning setups for embodied agents; learning new behaviors (Behavior Incremental Learning, Behavior-IL) and new environments (Environment Incremental Learning, Environment-IL) For the tasks, previous 'data prior' based continual learning methods maintain logits for the past tasks. However, the stored information is often insufficiently learned information and requires task boundary information, which might not always be available. Here, we propose to update them based on confidence scores without task boundary information during training (i.e., task-free) in a moving average fashion, named Confidence-Aware Moving Average (CAMA). In the proposed Behavior-IL and Environment-IL setups, our simple CAMA outperforms prior state of the art in our empirical validations by noticeable margins. The project page including codes is https://github.com/snumprlab/cl-alfred.
