Table of Contents
Fetching ...

AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning

Minghao Chen, Yihang Li, Yanting Yang, Shiyu Yu, Binbin Lin, Xiaofei He

TL;DR

This work introduces AutoManual, a framework enabling LLM agents to autonomously build their understanding through interaction and adapt to new environments, and introduces a *case-conditioned prompting* strategy for the Builder to mitigate hallucinations in managing rules.

Abstract

Large Language Models (LLM) based agents have shown promise in autonomously completing tasks across various domains, e.g., robotics, games, and web navigation. However, these agents typically require elaborate design and expert prompts to solve tasks in specific domains, which limits their adaptability. We introduce AutoManual, a framework enabling LLM agents to autonomously build their understanding through interaction and adapt to new environments. AutoManual categorizes environmental knowledge into diverse rules and optimizes them in an online fashion by two agents: 1) The Planner codes actionable plans based on current rules for interacting with the environment. 2) The Builder updates the rules through a well-structured rule system that facilitates online rule management and essential detail retention. To mitigate hallucinations in managing rules, we introduce a *case-conditioned prompting* strategy for the Builder. Finally, the Formulator agent compiles these rules into a comprehensive manual. The self-generated manual can not only improve the adaptability but also guide the planning of smaller LLMs while being human-readable. Given only one simple demonstration, AutoManual significantly improves task success rates, achieving 97.4\% with GPT-4-turbo and 86.2\% with GPT-3.5-turbo on ALFWorld benchmark tasks. The code is available at https://github.com/minghchen/automanual.

AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning

TL;DR

This work introduces AutoManual, a framework enabling LLM agents to autonomously build their understanding through interaction and adapt to new environments, and introduces a *case-conditioned prompting* strategy for the Builder to mitigate hallucinations in managing rules.

Abstract

Large Language Models (LLM) based agents have shown promise in autonomously completing tasks across various domains, e.g., robotics, games, and web navigation. However, these agents typically require elaborate design and expert prompts to solve tasks in specific domains, which limits their adaptability. We introduce AutoManual, a framework enabling LLM agents to autonomously build their understanding through interaction and adapt to new environments. AutoManual categorizes environmental knowledge into diverse rules and optimizes them in an online fashion by two agents: 1) The Planner codes actionable plans based on current rules for interacting with the environment. 2) The Builder updates the rules through a well-structured rule system that facilitates online rule management and essential detail retention. To mitigate hallucinations in managing rules, we introduce a *case-conditioned prompting* strategy for the Builder. Finally, the Formulator agent compiles these rules into a comprehensive manual. The self-generated manual can not only improve the adaptability but also guide the planning of smaller LLMs while being human-readable. Given only one simple demonstration, AutoManual significantly improves task success rates, achieving 97.4\% with GPT-4-turbo and 86.2\% with GPT-3.5-turbo on ALFWorld benchmark tasks. The code is available at https://github.com/minghchen/automanual.
Paper Structure (33 sections, 2 equations, 6 figures, 8 tables)

This paper contains 33 sections, 2 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: AutoManual Overview: AutoManual operates in three stages: (1) Building Stage: The Planner agent interacts with the environment by coding actionable plans. After receiving the current trajectory of the Planner, the Builder agent manages rules through the online rule system. (2) Formulating Stage: The Formulator agent formulates the resulting rules into a Markdown manual. (3) Testing Stage: A test-time Planner agent utilizes the manual to complete testing tasks.
  • Figure 2: The Planner Trajectory: Given the current task and rules, the Planner will interact with the environment through free-form code. Based on the trajectory result, the Planner will generate a corresponding conclusion, which will be saved in the skill or reflection library.
  • Figure 3: Case-Conditioned Prompts: Given the current trajectory, the Builder classifies the cause of the major error as "Imperfect Rules" or "Imperfect Agents". Then, the Builder will get the base prompt and corresponding prompt to guide its rule management.
  • Figure 4: (a) The success rate curve with standard deviation when testing GPT-4-turbo or GPT-3.5-turbo on ALFWorld. Building is performed cross-task or single-task type. (b) The success rate curve with standard deviation using AutoManual or Planner+Lib. when testing with GPT-4-turbo or GPT-3.5-turbo on 9 task types with feedback in MiniWob++.
  • Figure 5: The Generated Manual for ALFWorld: Part 1.
  • ...and 1 more figures