Table of Contents
Fetching ...

Towards Autonomous Agents: Adaptive-planning, Reasoning, and Acting in Language Models

Abhishek Dutta, Yen-Che Hsiao

TL;DR

The results show that the gemma-2-9b-it language model, using the proposed method, can successfully complete two of six tasks that failed in the first attempt, highlighting the effectiveness of the approach in enhancing the problem-solving capabilities of a single language model through self-correction.

Abstract

We propose a novel in-context learning algorithm for building autonomous decision-making language agents. The language agent continuously attempts to solve the same task by self-correcting each time the task fails. Our selected language agent demonstrates the ability to solve tasks in a text-based game environment. Our results show that the gemma-2-9b-it language model, using our proposed method, can successfully complete two of six tasks that failed in the first attempt. This highlights the effectiveness of our approach in enhancing the problem-solving capabilities of a single language model through self-correction, paving the way for more advanced autonomous agents. The code is publicly available at https://github.com/YenCheHsiao/AutonomousLLMAgentwithAdaptingPlanning.

Towards Autonomous Agents: Adaptive-planning, Reasoning, and Acting in Language Models

TL;DR

The results show that the gemma-2-9b-it language model, using the proposed method, can successfully complete two of six tasks that failed in the first attempt, highlighting the effectiveness of the approach in enhancing the problem-solving capabilities of a single language model through self-correction.

Abstract

We propose a novel in-context learning algorithm for building autonomous decision-making language agents. The language agent continuously attempts to solve the same task by self-correcting each time the task fails. Our selected language agent demonstrates the ability to solve tasks in a text-based game environment. Our results show that the gemma-2-9b-it language model, using our proposed method, can successfully complete two of six tasks that failed in the first attempt. This highlights the effectiveness of our approach in enhancing the problem-solving capabilities of a single language model through self-correction, paving the way for more advanced autonomous agents. The code is publicly available at https://github.com/YenCheHsiao/AutonomousLLMAgentwithAdaptingPlanning.
Paper Structure (9 sections, 7 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 9 sections, 7 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: An architecture towards autonomous agent. Created with BioRender.com.
  • Figure 2: Overview of the SALA architecture and interaction process. (Left) The SALA consists of an LLM backbone, context, and decision-making processes. These elements can be customized to create specialized language agents that solve a wide variety of decision-making tasks. (Right) An example of how the SALA interacts with the environment to solve a decision making task, showing the flow of actions, progress, updates, and replanning as part of the task-solving process. Created with BioRender.com.
  • Figure 3: A trajectory in the ALFWorld environment shridhar2020alfworld. The text in the black box is composed of the description of available receptacles, the description of the goal instruction, and a sequence of actions and observations. (1) The description of available receptacles is listed in the first part of the text with a green background. (2) The description of the goal instruction with a red background shows that the task is a Cool and Place task, and the goal is to cool some pan and put it in countertop. (3) The sequence of actions and observations with a magenta background shows the actions performed and the corresponding observations from the environment. The actions are in boldface after the greater than symbol, and the observations are in regular text below each action.
  • Figure 4: An exemplar in ReAct yao2023react for a Heat and Place task in the ALFWorld environment shridhar2020alfworld. The text in the black box is composed of the description from the ALFWorld environment shridhar2020alfworld and the thoughts annotated by ReAct yao2023react. The thought that decomposes the goal is shown with a magenta background. The thought that uses commonsense reasoning to find an object and determine what to do with it is shown with a red background. The thoughts that track subgoal completion are shown with a cyan background. The thoughts that determine the next subgoal are shown with a green background.
  • Figure 5: Two exemplars in Reflexion shinn2024reflexion for the ALFWorld environment shridhar2020alfworld. The text in each black box comprises one exemplar from Reflexion shinn2024reflexion designed to guide an LLM in generating the correct action to complete a task in the ALFWorld environment shridhar2020alfworld. In each black box, the text preceding the yellow-background text represents a ReAct trajectory, as shown in Fig. \ref{['fig:ReAct_ex']}. The text next to "STATUS: " indicates whether the task is completed. If the task is completed, the yellow-background text will read "STATUS: OK". If the task is not completed, it will read "STATUS: FAIL". The reflection text is highlighted with a cyan background.
  • ...and 2 more figures