Table of Contents
Fetching ...

CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments

Mingcong Lei, Ge Wang, Yiming Zhao, Zhixin Mai, Qing Zhao, Yao Guo, Zhen Li, Shuguang Cui, Yatong Han, Jinke Ren

TL;DR

The paper addresses robust long-horizon task execution for embodied agents in dynamic environments, where purely open-loop LLM plans can falter under uncertainty. It introduces CLEA, a closed-loop embodied agent that decouples planning, observation, memory, and evaluation across four open-source LLMs and a VLM-based observer, enabling memory-informed, adaptive replanning. Key contributions include a memory-driven belief state, a planner-critic loop for online feasibility checks, and real-world validation with two robots in a kitchen, showing significant gains over baselines. The work advances practical multimodal planning for real-world robotics and provides a reusable, open-source framework for robust embodied intelligence.

Abstract

Large Language Models (LLMs) exhibit remarkable capabilities in the hierarchical decomposition of complex tasks through semantic reasoning. However, their application in embodied systems faces challenges in ensuring reliable execution of subtask sequences and achieving one-shot success in long-term task completion. To address these limitations in dynamic environments, we propose Closed-Loop Embodied Agent (CLEA) -- a novel architecture incorporating four specialized open-source LLMs with functional decoupling for closed-loop task management. The framework features two core innovations: (1) Interactive task planner that dynamically generates executable subtasks based on the environmental memory, and (2) Multimodal execution critic employing an evaluation framework to conduct a probabilistic assessment of action feasibility, triggering hierarchical re-planning mechanisms when environmental perturbations exceed preset thresholds. To validate CLEA's effectiveness, we conduct experiments in a real environment with manipulable objects, using two heterogeneous robots for object search, manipulation, and search-manipulation integration tasks. Across 12 task trials, CLEA outperforms the baseline model, achieving a 67.3% improvement in success rate and a 52.8% increase in task completion rate. These results demonstrate that CLEA significantly enhances the robustness of task planning and execution in dynamic environments.

CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments

TL;DR

The paper addresses robust long-horizon task execution for embodied agents in dynamic environments, where purely open-loop LLM plans can falter under uncertainty. It introduces CLEA, a closed-loop embodied agent that decouples planning, observation, memory, and evaluation across four open-source LLMs and a VLM-based observer, enabling memory-informed, adaptive replanning. Key contributions include a memory-driven belief state, a planner-critic loop for online feasibility checks, and real-world validation with two robots in a kitchen, showing significant gains over baselines. The work advances practical multimodal planning for real-world robotics and provides a reusable, open-source framework for robust embodied intelligence.

Abstract

Large Language Models (LLMs) exhibit remarkable capabilities in the hierarchical decomposition of complex tasks through semantic reasoning. However, their application in embodied systems faces challenges in ensuring reliable execution of subtask sequences and achieving one-shot success in long-term task completion. To address these limitations in dynamic environments, we propose Closed-Loop Embodied Agent (CLEA) -- a novel architecture incorporating four specialized open-source LLMs with functional decoupling for closed-loop task management. The framework features two core innovations: (1) Interactive task planner that dynamically generates executable subtasks based on the environmental memory, and (2) Multimodal execution critic employing an evaluation framework to conduct a probabilistic assessment of action feasibility, triggering hierarchical re-planning mechanisms when environmental perturbations exceed preset thresholds. To validate CLEA's effectiveness, we conduct experiments in a real environment with manipulable objects, using two heterogeneous robots for object search, manipulation, and search-manipulation integration tasks. Across 12 task trials, CLEA outperforms the baseline model, achieving a 67.3% improvement in success rate and a 52.8% increase in task completion rate. These results demonstrate that CLEA significantly enhances the robustness of task planning and execution in dynamic environments.

Paper Structure

This paper contains 13 sections, 6 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 2: Overview of CLEA. The observer (VLM) provides environmental data, which the summarizer (LLM) processes into memory. The planner (LLM) generates an initial action sequence based on the robot's skill pool and memory, while the critic (VLM) evaluates action feasibility and offers re-plan recommendations in response to environmental dynamics.
  • Figure 3: The reasoning and output of CLEA. Unlike traditional failure-detection classification systems, CLEA performs internal reasoning upon receiving visual input and provides structured outputs. In the case where no medication is found in an empty drawer, the planner does not halt its intent. Instead, the critic suggests exploring alternative locations and provides the correct advice to check other compartments of the drawer, thereby guiding the successful completion of the task.
  • Figure 4: Visualization of the experimental environment.
  • Figure 5: Comparisons among the CLEA, the ablation, and the baseline agent.