Table of Contents
Fetching ...

ADAM: An Embodied Causal Agent in Open-World Environments

Shu Yu, Chaochao Lu

TL;DR

ADAM is a novel paradigm that integrates causal methods and embodied agents in a synergistic manner that can autonomously navigate the open world, perceive multimodal contexts, learn causal world knowledge, and tackle complex tasks through lifelong learning.

Abstract

In open-world environments like Minecraft, existing agents face challenges in continuously learning structured knowledge, particularly causality. These challenges stem from the opacity inherent in black-box models and an excessive reliance on prior knowledge during training, which impair their interpretability and generalization capability. To this end, we introduce ADAM, An emboDied causal Agent in Minecraft, that can autonomously navigate the open world, perceive multimodal contexts, learn causal world knowledge, and tackle complex tasks through lifelong learning. ADAM is empowered by four key components: 1) an interaction module, enabling the agent to execute actions while documenting the interaction processes; 2) a causal model module, tasked with constructing an ever-growing causal graph from scratch, which enhances interpretability and diminishes reliance on prior knowledge; 3) a controller module, comprising a planner, an actor, and a memory pool, which uses the learned causal graph to accomplish tasks; 4) a perception module, powered by multimodal large language models, which enables ADAM to perceive like a human player. Extensive experiments show that ADAM constructs an almost perfect causal graph from scratch, enabling efficient task decomposition and execution with strong interpretability. Notably, in our modified Minecraft games where no prior knowledge is available, ADAM maintains its performance and shows remarkable robustness and generalization capability. ADAM pioneers a novel paradigm that integrates causal methods and embodied agents in a synergistic manner. Our project page is at https://opencausalab.github.io/ADAM.

ADAM: An Embodied Causal Agent in Open-World Environments

TL;DR

ADAM is a novel paradigm that integrates causal methods and embodied agents in a synergistic manner that can autonomously navigate the open world, perceive multimodal contexts, learn causal world knowledge, and tackle complex tasks through lifelong learning.

Abstract

In open-world environments like Minecraft, existing agents face challenges in continuously learning structured knowledge, particularly causality. These challenges stem from the opacity inherent in black-box models and an excessive reliance on prior knowledge during training, which impair their interpretability and generalization capability. To this end, we introduce ADAM, An emboDied causal Agent in Minecraft, that can autonomously navigate the open world, perceive multimodal contexts, learn causal world knowledge, and tackle complex tasks through lifelong learning. ADAM is empowered by four key components: 1) an interaction module, enabling the agent to execute actions while documenting the interaction processes; 2) a causal model module, tasked with constructing an ever-growing causal graph from scratch, which enhances interpretability and diminishes reliance on prior knowledge; 3) a controller module, comprising a planner, an actor, and a memory pool, which uses the learned causal graph to accomplish tasks; 4) a perception module, powered by multimodal large language models, which enables ADAM to perceive like a human player. Extensive experiments show that ADAM constructs an almost perfect causal graph from scratch, enabling efficient task decomposition and execution with strong interpretability. Notably, in our modified Minecraft games where no prior knowledge is available, ADAM maintains its performance and shows remarkable robustness and generalization capability. ADAM pioneers a novel paradigm that integrates causal methods and embodied agents in a synergistic manner. Our project page is at https://opencausalab.github.io/ADAM.

Paper Structure

This paper contains 43 sections, 12 figures, 7 tables.

Figures (12)

  • Figure 1: An example of the questions in MC-QA dataset.
  • Figure 1.1: (a) The technology tree for acquiring diamondsin the Minecraft game. Adam can precisely discover item dependencies from scratch. (b) Modified Minecraft technology tree, where the prior knowledge from the Internet or wiki does not align with the actual game dynamics. Red arrows denote removed dependencies, while blue arrows denote added dependencies. (c) In the game setting shown in (b), Adam maintains the ability to learn the correct causal graph and successfully obtains diamonds, whereas other methods can only acquire raw_ironwithin the step limit, and Adam achieves a 4.6$\times$ speedup in obtaining raw_ironcompared to the SOTA.
  • Figure 1.2: Four key modules of Adam. The interaction module executes actions in the environment according to the task and records the processes. The causal model module identifies the causal relationship between items and actions to construct an ever-growing causal graph. The controller module implements task execution based on the learned causal graph. The perception module aligns the agent's behavior more closely with human gameplay.
  • Figure 3.1: The interaction module has two core functionalities: sampling and recording. Sampling involves executing actions in the environment, and recording involves processing and documenting the observable information. For instance, the initial action space is {gatherWoodLog}, whose name is not exposed to Adam and is denoted as $\{a\}$ here (Note that, the original notation {gatherWoodLog} is retained in the figure for the illustrative purpose.). The initial observed item space is $\varnothing$. After executing $a$ for one step, logs () are obtained. A sampling can be represented as $(a, \varnothing, \{\})$, where $\varnothing$ is the initial inventory and {} is the inventory after this step. The result is recorded as $R = (\varnothing, \varnothing, \{\})$, where the first $\varnothing$ is the initial inventory and the second $\varnothing$ indicates that no items are consumed, and $\{\}$ represents the items that are obtained. After sampling $N$ times, data $D_a = \{R_1,\ldots,R_N\}$ is provided to the causal model module for CD. If the causal relation failed to be identified, resampling on $a$ occurs; if successful, new actions like craftPlanks are enabled by the acquisition of , and the observed item space is updated to {}.
  • Figure 3.2: LLM-based CD performs causal reasoning under the guidance of the prompt. Role Playing assigns an analysis assistant role to the LLM. Problem Setting provides the reasoning task. Letter Mapping maps the item names to letters for the accurate output. Few-shot Prompting provides examples for chain-of-thought wei2022chain reasoning. Data $D_a$ is presented in the same form as the few-shot examples. The output of LLM serves as the causal assumption.
  • ...and 7 more figures