Table of Contents
Fetching ...

E2CL: Exploration-based Error Correction Learning for Embodied Agents

Hanlin Wang, Chak Tou Leong, Jian Wang, Wenjie Li

TL;DR

Inspired by human learning, E2CL is proposed, a novel framework that leverages exploration-induced errors and environmental feedback to enhance environment alignment for embodied agents and outperform those trained by baseline methods and exhibit superior self-correction capabilities.

Abstract

Language models are exhibiting increasing capability in knowledge utilization and reasoning. However, when applied as agents in embodied environments, they often suffer from misalignment between their intrinsic knowledge and environmental knowledge, leading to infeasible actions. Traditional environment alignment methods, such as supervised learning on expert trajectories and reinforcement learning, encounter limitations in covering environmental knowledge and achieving efficient convergence, respectively. Inspired by human learning, we propose Exploration-based Error Correction Learning (E2CL), a novel framework that leverages exploration-induced errors and environmental feedback to enhance environment alignment for embodied agents. E2CL incorporates teacher-guided and teacher-free explorations to gather environmental feedback and correct erroneous actions. The agent learns to provide feedback and self-correct, thereby enhancing its adaptability to target environments. Extensive experiments in the VirtualHome environment demonstrate that E2CL-trained agents outperform those trained by baseline methods and exhibit superior self-correction capabilities.

E2CL: Exploration-based Error Correction Learning for Embodied Agents

TL;DR

Inspired by human learning, E2CL is proposed, a novel framework that leverages exploration-induced errors and environmental feedback to enhance environment alignment for embodied agents and outperform those trained by baseline methods and exhibit superior self-correction capabilities.

Abstract

Language models are exhibiting increasing capability in knowledge utilization and reasoning. However, when applied as agents in embodied environments, they often suffer from misalignment between their intrinsic knowledge and environmental knowledge, leading to infeasible actions. Traditional environment alignment methods, such as supervised learning on expert trajectories and reinforcement learning, encounter limitations in covering environmental knowledge and achieving efficient convergence, respectively. Inspired by human learning, we propose Exploration-based Error Correction Learning (E2CL), a novel framework that leverages exploration-induced errors and environmental feedback to enhance environment alignment for embodied agents. E2CL incorporates teacher-guided and teacher-free explorations to gather environmental feedback and correct erroneous actions. The agent learns to provide feedback and self-correct, thereby enhancing its adaptability to target environments. Extensive experiments in the VirtualHome environment demonstrate that E2CL-trained agents outperform those trained by baseline methods and exhibit superior self-correction capabilities.
Paper Structure (31 sections, 3 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 31 sections, 3 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Traditional "success only learning" relies on imitating provided expert behaviors, limiting comprehensiveness. Our proposed exploration-based error correction learning (E$^2$CL) framework enhances learning by incorporating exploration-induced errors and environmental feedback during training, leading to better alignment with target environments. During inference, the agent utilizes the learned abilities to conduct self-feedback for continuous self-correction.
  • Figure 2: Overview of the proposed Exploration-based Error Correction Learning (E$^2$CL) framework.
  • Figure 3: Task-solving performance of the agent on unseen tasks based on different sizes of LM and different training methods.
  • Figure 4: Comparison of self-correction capability between our method and other baseline methods.
  • Figure 5: Task-solving performance of the agent on unseen tasks when fed with different types of feedback. Random: Randomly select a feedback type from all available feedback. Boolean: a ground truth boolean signal, indicating whether the initial action is executable or not. Ours: The self-generated feedback on the initial action used in our method. Ground truth: The ground truth feedback from the environment.
  • ...and 4 more figures