Table of Contents
Fetching ...

$How^{2}$: How to learn from procedural How-to questions

Gautier Dagan, Frank Keller, Alex Lascarides

TL;DR

The paper introduces the memory-augmented framework $How^2$ for lifelong learning from procedural how-to questions to improve planning in interactive environments. By decoupling guidance from the current state through abstraction and parsing, it enables reusable memory entries across tasks. In Plancraft, fully executable teacher answers yield high immediate success, while abstracted subgoals enhance long-term reuse, with the complete $How^2$ pipeline balancing immediate utility and autonomous learning. These results demonstrate the potential of memory-guided planning for LLM-based agents in complex, open-ended domains.

Abstract

An agent facing a planning problem can use answers to how-to questions to reduce uncertainty and fill knowledge gaps, helping it solve both current and future tasks. However, their open ended nature, where valid answers to "How do I X?" range from executable actions to high-level descriptions of X's sub-goals, makes them challenging for AI agents to ask, and for AI experts to answer, in ways that support efficient planning. We introduce $How^{2}$, a memory agent framework that enables agents to ask how-to questions, store the answers, and reuse them for lifelong learning in interactive environments. We evaluate our approach in Plancraft, a Minecraft crafting environment, where agents must complete an assembly task by manipulating inventory items. Using teacher models that answer at varying levels of abstraction, from executable action sequences to high-level subgoal descriptions, we show that lifelong learning agents benefit most from answers that are abstracted and decoupled from the current state. $How^{2}$ offers a way for LLM-based agents to improve their planning capabilities over time by asking questions in interactive environments.

$How^{2}$: How to learn from procedural How-to questions

TL;DR

The paper introduces the memory-augmented framework for lifelong learning from procedural how-to questions to improve planning in interactive environments. By decoupling guidance from the current state through abstraction and parsing, it enables reusable memory entries across tasks. In Plancraft, fully executable teacher answers yield high immediate success, while abstracted subgoals enhance long-term reuse, with the complete pipeline balancing immediate utility and autonomous learning. These results demonstrate the potential of memory-guided planning for LLM-based agents in complex, open-ended domains.

Abstract

An agent facing a planning problem can use answers to how-to questions to reduce uncertainty and fill knowledge gaps, helping it solve both current and future tasks. However, their open ended nature, where valid answers to "How do I X?" range from executable actions to high-level descriptions of X's sub-goals, makes them challenging for AI agents to ask, and for AI experts to answer, in ways that support efficient planning. We introduce , a memory agent framework that enables agents to ask how-to questions, store the answers, and reuse them for lifelong learning in interactive environments. We evaluate our approach in Plancraft, a Minecraft crafting environment, where agents must complete an assembly task by manipulating inventory items. Using teacher models that answer at varying levels of abstraction, from executable action sequences to high-level subgoal descriptions, we show that lifelong learning agents benefit most from answers that are abstracted and decoupled from the current state. offers a way for LLM-based agents to improve their planning capabilities over time by asking questions in interactive environments.

Paper Structure

This paper contains 38 sections, 6 equations, 24 figures, 11 tables, 1 algorithm.

Figures (24)

  • Figure 1: We solve a Minecraft planning task through a lifelong mechanism in a student/teacher setup. We use a memory to store procedural answers to how-to questions. Our $How^2$ framework abstracts the executable plans, to decouple the teacher's answers from the game state and generalise memory entries for re-use.
  • Figure 2: Our proposed $How^2$ agent framework for lifelong learning with external knowledge from a teacher. 1) The agent can call a read-memory tool which queries the memory module with a query $\theta$. The memory is a key-value mapping which retrieves and indexes memories given the search query $\theta$. 2) When nothing is stored under $\theta$ or all memories fail a relevance check w.r.t. the current state, then 3) the agent asks a how-to question to the teacher. 4) The teacher answers the question with different levels of executability. 5) The answer is parsed to decouple it from the current state and generalise the instructions. 6) The memory is stored under $\theta$ in the memory and returned to the main agent.
  • Figure 3: The executable teacher returns a full plan that is conditioned on the current inventory—where the inventory locations are instantiated. The subgoal-partially-executable teacher returns instructions where the inventory slots are not specified and decomposes each subtasks into identifiable subgoals. This generalises to unseen inventories as the crafting patterns remain the same. Lastly, the non-executable teacher returns an entirely ungrounded plan and instead uses pattern abstractions such as shapes and relative positions.
  • Figure 4: Bar chart showing the success rate of the different teacher types in Just Ask. When the teacher is invoked at the beginning of the episode, the success rate is significantly higher than when it is called later. This is consistent across all teacher types. The executable teacher outperforms all other teachers, especially if called after the first action.
  • Figure 5: Heat-map for the performance of the executable teacher in each setup. We show the success rate (colour) and counts (values) per cache misses and cache hits. This highlights the effectiveness $How^2$ in improving agent performance by filtering irrelevant memories, but also the trade-off between cache hits and success.
  • ...and 19 more figures