Table of Contents
Fetching ...

Experience-based Knowledge Correction for Robust Planning in Minecraft

Seungjoon Lee, Suhwan Kim, Minhyeon Oh, Youngsik Yoon, Jungseul Ok

TL;DR

XENON addresses the problem of flawed planning priors in LLM-based Minecraft agents by introducing experience-based algorithmic knowledge correction. It combines Adaptive Dependency Graph (ADG) and Failure-aware Action Memory (FAM) to revise dependencies and actions from binary success/failure feedback, rather than relying on LLM self-correction. Across MineRL, Mineflayer, MC-TextWorld, and related benchmarks, XENON with a 7B-LM outperforms larger proprietary-model baselines and shows robustness to hallucinations and environmental perturbations. The work demonstrates that external memory-based correction can enable reliable, long-horizon planning with lightweight LLMs, suggesting a practical path for embodied agents using accessible models.

Abstract

Large Language Model (LLM)-based planning has advanced embodied agents in long-horizon environments such as Minecraft, where acquiring latent knowledge of goal (or item) dependencies and feasible actions is critical. However, LLMs often begin with flawed priors and fail to correct them through prompting, even with feedback. We present XENON (eXpErience-based kNOwledge correctioN), an agent that algorithmically revises knowledge from experience, enabling robustness to flawed priors and sparse binary feedback. XENON integrates two mechanisms: Adaptive Dependency Graph, which corrects item dependencies using past successes, and Failure-aware Action Memory, which corrects action knowledge using past failures. Together, these components allow XENON to acquire complex dependencies despite limited guidance. Experiments across multiple Minecraft benchmarks show that XENON outperforms prior agents in both knowledge learning and long-horizon planning. Remarkably, with only a 7B open-weight LLM, XENON surpasses agents that rely on much larger proprietary models. Code available at https://sjlee-me.github.io/XENON

Experience-based Knowledge Correction for Robust Planning in Minecraft

TL;DR

XENON addresses the problem of flawed planning priors in LLM-based Minecraft agents by introducing experience-based algorithmic knowledge correction. It combines Adaptive Dependency Graph (ADG) and Failure-aware Action Memory (FAM) to revise dependencies and actions from binary success/failure feedback, rather than relying on LLM self-correction. Across MineRL, Mineflayer, MC-TextWorld, and related benchmarks, XENON with a 7B-LM outperforms larger proprietary-model baselines and shows robustness to hallucinations and environmental perturbations. The work demonstrates that external memory-based correction can enable reliable, long-horizon planning with lightweight LLMs, suggesting a practical path for embodied agents using accessible models.

Abstract

Large Language Model (LLM)-based planning has advanced embodied agents in long-horizon environments such as Minecraft, where acquiring latent knowledge of goal (or item) dependencies and feasible actions is critical. However, LLMs often begin with flawed priors and fail to correct them through prompting, even with feedback. We present XENON (eXpErience-based kNOwledge correctioN), an agent that algorithmically revises knowledge from experience, enabling robustness to flawed priors and sparse binary feedback. XENON integrates two mechanisms: Adaptive Dependency Graph, which corrects item dependencies using past successes, and Failure-aware Action Memory, which corrects action knowledge using past failures. Together, these components allow XENON to acquire complex dependencies despite limited guidance. Experiments across multiple Minecraft benchmarks show that XENON outperforms prior agents in both knowledge learning and long-horizon planning. Remarkably, with only a 7B open-weight LLM, XENON surpasses agents that rely on much larger proprietary models. Code available at https://sjlee-me.github.io/XENON

Paper Structure

This paper contains 90 sections, 1 equation, 25 figures, 21 tables, 4 algorithms.

Figures (25)

  • Figure 1: An LLM exhibits flawed planning knowledge and fails at self-correction. (b) The dependency graph predicted by Qwen2.5-VL-7B qwen2.5-VL contains multiple errors (e.g., missed dependencies, hallucinated items) compared to (a) the ground truth. (c, d) The LLM fails to correct its flawed knowledge about dependencies and actions from failure feedbacks, often repeating the same errors. See \ref{['sec:correction_prompt_qualitative']} for the full prompts and LLM's self-correction examples.
  • Figure 2: Overview. XENON updates Adaptive Dependency Graph and Failure-aware Action Memory with environmental experiences.
  • Figure 3: Overview. XENON updates Adaptive Dependency Graph and Failure-aware Action Memory with environmental experiences.
  • Figure 4: XENON’s algorithmic knowledge correction. (a) Dependency Correction via RevisionByAnalogy. Case 1: For an inadmissible item (e.g., a hallucinated item), its descendants are recursively revised to remove the flawed dependency. Case 2: A flawed requirement set is revised by referencing similar, obtained items. (b) Action Correction via FAM. FAM prunes invalid actions from the LLM's prompt based on failures, guiding it to select an under-explored action.
  • Figure 5: Robustness against flawed prior knowledge. EGA over 400 episodes in (a) MineRL and (b) Mineflayer. XENON consistently outperforms the baselines.
  • ...and 20 more figures