Table of Contents
Fetching ...

Devil's Advocate: Anticipatory Reflection for LLM Agents

Haoyu Wang, Tao Li, Zhiwei Deng, Dan Roth, Yang Li

TL;DR

The experimental results suggest that the introspection-driven approach not only enhances the agent's ability to navigate unanticipated challenges through a robust mechanism of plan execution, but also improves efficiency by reducing the number of trials and plan revisions by 45% needed to achieve a task.

Abstract

In this work, we introduce a novel approach that equips LLM agents with introspection, enhancing consistency and adaptability in solving complex tasks. Our approach prompts LLM agents to decompose a given task into manageable subtasks (i.e., to make a plan), and to continuously introspect upon the suitability and results of their actions. %; and when necessary, to explore ``the road not taken.'' We implement a three-fold introspective intervention: 1) anticipatory reflection on potential failures and alternative remedy before action execution, 2) post-action alignment with subtask objectives and backtracking with remedy to ensure utmost effort in plan execution, and 3) comprehensive review upon plan completion for future strategy refinement. By deploying and experimenting with this methodology -- a zero-shot approach -- within WebArena for practical tasks in web environments, our agent demonstrates superior performance with a success rate of 23.5% over existing zero-shot methods by 3.5%. The experimental results suggest that our introspection-driven approach not only enhances the agent's ability to navigate unanticipated challenges through a robust mechanism of plan execution, but also improves efficiency by reducing the number of trials and plan revisions by 45% needed to achieve a task.

Devil's Advocate: Anticipatory Reflection for LLM Agents

TL;DR

The experimental results suggest that the introspection-driven approach not only enhances the agent's ability to navigate unanticipated challenges through a robust mechanism of plan execution, but also improves efficiency by reducing the number of trials and plan revisions by 45% needed to achieve a task.

Abstract

In this work, we introduce a novel approach that equips LLM agents with introspection, enhancing consistency and adaptability in solving complex tasks. Our approach prompts LLM agents to decompose a given task into manageable subtasks (i.e., to make a plan), and to continuously introspect upon the suitability and results of their actions. %; and when necessary, to explore ``the road not taken.'' We implement a three-fold introspective intervention: 1) anticipatory reflection on potential failures and alternative remedy before action execution, 2) post-action alignment with subtask objectives and backtracking with remedy to ensure utmost effort in plan execution, and 3) comprehensive review upon plan completion for future strategy refinement. By deploying and experimenting with this methodology -- a zero-shot approach -- within WebArena for practical tasks in web environments, our agent demonstrates superior performance with a success rate of 23.5% over existing zero-shot methods by 3.5%. The experimental results suggest that our introspection-driven approach not only enhances the agent's ability to navigate unanticipated challenges through a robust mechanism of plan execution, but also improves efficiency by reducing the number of trials and plan revisions by 45% needed to achieve a task.
Paper Structure (23 sections, 9 equations, 7 figures, 1 table)

This paper contains 23 sections, 9 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Conceptual difference between our anticipatory reflection and regular ones. Circles denote states and arrows actions. At the branching level, our method does not only yield the next action, but also anticipates a potential error associated with it and plans for backups. In contrast, regular reflection performs trials sequentially, correcting one error for each pass.
  • Figure 2: An example plan with 5 subtasks, generated by GPT-4. Subtasks are generated based on the first observation $\mathcal{S}_0$ and prior knowledge about web operation.
  • Figure 3: Distribution of WebArena tasks based on the number of subtasks within each task. The number of subtasks has a majority within 4-9 with a long tail distribution.
  • Figure 4: Screen observation at one step in solving the subtask: Click on the order details link for the order from November 2022. The agent might decide to click ($a_t$) on the "View Order" button of any one of the three Nov 2022 orders to see if a picture frame was purchased in that order, and it is highly probable that backtracking is needed to view the details of the other two orders (if the first chosen is not a picture frame). In our proposed approach, the other two alternative clicking actions $[a_t^{1}, a_t^{2}]$ would be pushed to stack before the agent executes action $a_t$.
  • Figure 5: Decision making process of our agent in solving the task: What is the color configuration of the picture frame that I bought in Sep 2022? Before execution of the predicted action, the agent asks a follow-up question to itself regarding its decision: what if the picture frame is not in order #179? what should be the alternative remedy? And after finding out that order #179 contains no picture frame at all, the agent backtracks to the previous state to view order #175 and continue.
  • ...and 2 more figures