Table of Contents
Fetching ...

Online Continual Learning For Interactive Instruction Following Agents

Byeonghwi Kim, Minhyuk Seo, Jonghyun Choi

TL;DR

This work introduces two online continual-learning paradigms for embodied agents—Behavior-IL and Environment-IL—to enable continual acquisition of new behaviors and environments in interactive instruction following. It presents Confidence-Aware Moving Average (CAMA), a task-free logit-update mechanism that dynamically weights past and current logits based on predicted confidence, mitigating outdated knowledge without requiring task boundaries. Through extensive experiments on the ALFRED benchmark, CAMA consistently surpasses state-of-the-art baselines (replay, regularization, and distillation methods) on both seen and unseen environments, demonstrating robust continual adaptation with language understanding and object localization. The approach offers a practical path toward real-world embodied agents that learn continuously from streaming experiences while preserving previously learned capabilities.

Abstract

In learning an embodied agent executing daily tasks via language directives, the literature largely assumes that the agent learns all training data at the beginning. We argue that such a learning scenario is less realistic since a robotic agent is supposed to learn the world continuously as it explores and perceives it. To take a step towards a more realistic embodied agent learning scenario, we propose two continual learning setups for embodied agents; learning new behaviors (Behavior Incremental Learning, Behavior-IL) and new environments (Environment Incremental Learning, Environment-IL) For the tasks, previous 'data prior' based continual learning methods maintain logits for the past tasks. However, the stored information is often insufficiently learned information and requires task boundary information, which might not always be available. Here, we propose to update them based on confidence scores without task boundary information during training (i.e., task-free) in a moving average fashion, named Confidence-Aware Moving Average (CAMA). In the proposed Behavior-IL and Environment-IL setups, our simple CAMA outperforms prior state of the art in our empirical validations by noticeable margins. The project page including codes is https://github.com/snumprlab/cl-alfred.

Online Continual Learning For Interactive Instruction Following Agents

TL;DR

This work introduces two online continual-learning paradigms for embodied agents—Behavior-IL and Environment-IL—to enable continual acquisition of new behaviors and environments in interactive instruction following. It presents Confidence-Aware Moving Average (CAMA), a task-free logit-update mechanism that dynamically weights past and current logits based on predicted confidence, mitigating outdated knowledge without requiring task boundaries. Through extensive experiments on the ALFRED benchmark, CAMA consistently surpasses state-of-the-art baselines (replay, regularization, and distillation methods) on both seen and unseen environments, demonstrating robust continual adaptation with language understanding and object localization. The approach offers a practical path toward real-world embodied agents that learn continuously from streaming experiences while preserving previously learned capabilities.

Abstract

In learning an embodied agent executing daily tasks via language directives, the literature largely assumes that the agent learns all training data at the beginning. We argue that such a learning scenario is less realistic since a robotic agent is supposed to learn the world continuously as it explores and perceives it. To take a step towards a more realistic embodied agent learning scenario, we propose two continual learning setups for embodied agents; learning new behaviors (Behavior Incremental Learning, Behavior-IL) and new environments (Environment Incremental Learning, Environment-IL) For the tasks, previous 'data prior' based continual learning methods maintain logits for the past tasks. However, the stored information is often insufficiently learned information and requires task boundary information, which might not always be available. Here, we propose to update them based on confidence scores without task boundary information during training (i.e., task-free) in a moving average fashion, named Confidence-Aware Moving Average (CAMA). In the proposed Behavior-IL and Environment-IL setups, our simple CAMA outperforms prior state of the art in our empirical validations by noticeable margins. The project page including codes is https://github.com/snumprlab/cl-alfred.
Paper Structure (51 sections, 6 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 51 sections, 6 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Proposed two incremental learning setups. In the 'Behavior Incremental' setup, the agent is expected to learn new behaviors while preserving previously learned knowledge. In the 'Environment Incremental' setup, the agent is expected to learn tasks in new environments with the preservation of previously learned knowledge. Note that each image in the figure denotes an expert demonstration (i.e., a sequence of frames with natural language instructions followed by a corresponding sequence of actions and object class labels).
  • Figure 2: Proposed Confidence-Aware Moving Average (CAMA). 'Current Logits' denotes the model's logits obtained from the input samples from the current stream and episodic memory. 'Previous Logits' denotes logits stored in episodic memory before an update. $Q_i$ denotes a queue that stores ground truth confidence scores acquired from the current logits, $y_1, y_2, ...$, for the $i^{th}$ class. To obtain $\gamma_i$, we maintain the recent $N$ confidence scores for the $i^{th}$ class and calculate the mean value of the scores followed by a clip function. Finally, we use all $\gamma_i$'s to class-wisely weight-sum previously stored logits (i.e., 'Previous Logits') and newly obtained logits from the current stream (i.e., 'Current Logits').
  • Figure 3: Confidence and accuracy of logits used for logit update in CAMA. 'Accuracy' denotes the mean of the frame-wise accuracies measured from the newly obtained logits (here, $z'_a$) in Equation \ref{['eq:weighted_sum']}. 'Confidence' denotes the dynamically determined coefficients (here, $\gamma_a$) for the update in Equation \ref{['eq:mean_confidence_score']}.
  • Figure 4: Qualitative analysis of the proposed method (Behavior-IL). The agent, having already acquired knowledge of the behavior $\tau_{j-1} = \textsc{Heat}$, proceeds to learn the new behavior $\tau_j = \textsc{Pick2\&Place}$. Subsequently, we evaluate the agent's ability of the prior behavior $\tau_{j-1}$ to determine if any forgetting has occurred. Irrelevant Action denotes an action that results in incorrect navigation. 'Finetuning' fails to find a target object, 'Mug,' and eventually fails at the task. DER++ succeeds in navigating to and picking up the mug but fails to reach a microwave above the agent, also leading to task failure. On the contrary, our CAMA further succeeds in reaching the microwave, heating the mug, and putting it back on the coffee machine, leading to task success.
  • Figure 5: Qualitative analysis of the proposed method (Environment-IL). The agent that has already acquired knowledge of the environment $e_{k-1} = \textsc{Bedrooms}$ proceeds to learn the new environment $e_k = \textsc{Bathrooms}$. We then assess the agent's capability in the prior environment $e_{k-1}$ to determine whether any forgetting has occurred. Irrelevant Action denotes an action that results in incorrect navigation. 'Finetuning' fails to find a target object, 'CD,' eventually leading to task failure. DER++ can navigate to and pick up the CD, but fails to turn on the lamp. On the contrary, our CAMA can also turn on the lamp and complete the task.
  • ...and 1 more figures