Table of Contents
Fetching ...

MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making

Dayuan Fu, Biqing Qi, Yihuai Gao, Che Jiang, Guanting Dong, Bowen Zhou

TL;DR

MSI-Agent is introduced, an embodied agent designed to improve LLMs’ planning and decision-making ability by summarizing and utilizing insight effectively across different scales by leveraging a three-part pipeline: experience selector, insight generator, and insight selector.

Abstract

Long-term memory is significant for agents, in which insights play a crucial role. However, the emergence of irrelevant insight and the lack of general insight can greatly undermine the effectiveness of insight. To solve this problem, in this paper, we introduce Multi-Scale Insight Agent (MSI-Agent), an embodied agent designed to improve LLMs' planning and decision-making ability by summarizing and utilizing insight effectively across different scales. MSI achieves this through the experience selector, insight generator, and insight selector. Leveraging a three-part pipeline, MSI can generate task-specific and high-level insight, store it in a database, and then use relevant insight from it to aid in decision-making. Our experiments show that MSI outperforms another insight strategy when planning by GPT3.5. Moreover, We delve into the strategies for selecting seed experience and insight, aiming to provide LLM with more useful and relevant insight for better decision-making. Our observations also indicate that MSI exhibits better robustness when facing domain-shifting scenarios.

MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making

TL;DR

MSI-Agent is introduced, an embodied agent designed to improve LLMs’ planning and decision-making ability by summarizing and utilizing insight effectively across different scales by leveraging a three-part pipeline: experience selector, insight generator, and insight selector.

Abstract

Long-term memory is significant for agents, in which insights play a crucial role. However, the emergence of irrelevant insight and the lack of general insight can greatly undermine the effectiveness of insight. To solve this problem, in this paper, we introduce Multi-Scale Insight Agent (MSI-Agent), an embodied agent designed to improve LLMs' planning and decision-making ability by summarizing and utilizing insight effectively across different scales. MSI achieves this through the experience selector, insight generator, and insight selector. Leveraging a three-part pipeline, MSI can generate task-specific and high-level insight, store it in a database, and then use relevant insight from it to aid in decision-making. Our experiments show that MSI outperforms another insight strategy when planning by GPT3.5. Moreover, We delve into the strategies for selecting seed experience and insight, aiming to provide LLM with more useful and relevant insight for better decision-making. Our observations also indicate that MSI exhibits better robustness when facing domain-shifting scenarios.
Paper Structure (25 sections, 5 equations, 5 figures, 6 tables)

This paper contains 25 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Example of insight summarizing and utilizing. MSI will summarize the insights in multi-scale and utilize insights by selecting based on the task. DB=Database.
  • Figure 2: The overall pipeline for the MSI-agent to complete a task. MSI Memory refers to the part that deals with insight. In MSI Memory, Experience Selection and Insight Generation will summarize historical experience into insights, while Insight Selection will select insights to assist the executor in completing future tasks.
  • Figure 3: Pipeline of MSI Memory. The Insight Summarization part will summarize the historical task experience, while the Insight Utilization part will select relative insights to help the agent decide on future work. In the Insight Generation part, we will continuously update the insight database based on the training task experience (pair). We will freeze the database after updating insight with all training tasks. It should be noted that only some task generates environment insights (aligning with § \ref{['defination']}). Env=environment
  • Figure 4: An example of 3 plans dealing with a specific task in TEACh. (A) The original task's user query, we omit some responses. (B) Plan to finish the task without experience. (C) Expel insights example (D) MSI insights example(E) Plan to finish the task with Expel. (F) Plan to finish the task with MSI. We omit most of the insights in Expel and MSI due to the length limitation.
  • Figure 5: The robustness of agents when facing domain shifting. Dashed lines indicate baseline scores without insight or with random scheme shuffling across three domains. Solid lines show scores after sequential insight summarization: first, kitchen experiences inform insight; then living room experiences update it; finally, bedroom experiences refine it, with corresponding results displayed under each domain.