Table of Contents
Fetching ...

WESE: Weak Exploration to Strong Exploitation for LLM Agents

Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Defu Lian, Yasheng Wang, Ruiming Tang, Enhong Chen

TL;DR

WESE tackles the exploration-exploitation dilemma in open-world LLM agents by explicitly separating exploration and exploitation, using a weak LLM to gather global environmental knowledge and compressing it into a knowledge graph. A one-hop retrieval strategy then provides task-relevant priors to a strong exploitation agent, improving decision-making and QA performance while reducing cost. Across four benchmarks, WESE achieves a favorable balance between effectiveness, efficiency, and cost, outperforming baselines and illustrating the practicality of decoupled exploration for scalable LLM agents. This approach offers a cost-efficient path to more capable open-world agents suitable for real-world interactive tasks.

Abstract

Recently, large language models (LLMs) have demonstrated remarkable potential as an intelligent agent. However, existing researches mainly focus on enhancing the agent's reasoning or decision-making abilities through well-designed prompt engineering or task-specific fine-tuning, ignoring the procedure of exploration and exploitation. When addressing complex tasks within open-world interactive environments, these methods exhibit limitations. Firstly, the lack of global information of environments leads to greedy decisions, resulting in sub-optimal solutions. On the other hand, irrelevant information acquired from the environment not only adversely introduces noise, but also incurs additional cost. This paper proposes a novel approach, Weak Exploration to Strong Exploitation (WESE), to enhance LLM agents in solving open-world interactive tasks. Concretely, WESE involves decoupling the exploration and exploitation process, employing a cost-effective weak agent to perform exploration tasks for global knowledge. A knowledge graph-based strategy is then introduced to store the acquired knowledge and extract task-relevant knowledge, enhancing the stronger agent in success rate and efficiency for the exploitation task. Our approach is flexible enough to incorporate diverse tasks, and obtains significant improvements in both success rates and efficiency across four interactive benchmarks.

WESE: Weak Exploration to Strong Exploitation for LLM Agents

TL;DR

WESE tackles the exploration-exploitation dilemma in open-world LLM agents by explicitly separating exploration and exploitation, using a weak LLM to gather global environmental knowledge and compressing it into a knowledge graph. A one-hop retrieval strategy then provides task-relevant priors to a strong exploitation agent, improving decision-making and QA performance while reducing cost. Across four benchmarks, WESE achieves a favorable balance between effectiveness, efficiency, and cost, outperforming baselines and illustrating the practicality of decoupled exploration for scalable LLM agents. This approach offers a cost-efficient path to more capable open-world agents suitable for real-world interactive tasks.

Abstract

Recently, large language models (LLMs) have demonstrated remarkable potential as an intelligent agent. However, existing researches mainly focus on enhancing the agent's reasoning or decision-making abilities through well-designed prompt engineering or task-specific fine-tuning, ignoring the procedure of exploration and exploitation. When addressing complex tasks within open-world interactive environments, these methods exhibit limitations. Firstly, the lack of global information of environments leads to greedy decisions, resulting in sub-optimal solutions. On the other hand, irrelevant information acquired from the environment not only adversely introduces noise, but also incurs additional cost. This paper proposes a novel approach, Weak Exploration to Strong Exploitation (WESE), to enhance LLM agents in solving open-world interactive tasks. Concretely, WESE involves decoupling the exploration and exploitation process, employing a cost-effective weak agent to perform exploration tasks for global knowledge. A knowledge graph-based strategy is then introduced to store the acquired knowledge and extract task-relevant knowledge, enhancing the stronger agent in success rate and efficiency for the exploitation task. Our approach is flexible enough to incorporate diverse tasks, and obtains significant improvements in both success rates and efficiency across four interactive benchmarks.
Paper Structure (18 sections, 2 equations, 3 figures, 4 tables, 3 algorithms)

This paper contains 18 sections, 2 equations, 3 figures, 4 tables, 3 algorithms.

Figures (3)

  • Figure 1: Examples for sub-optimal decisions and irrelevant information in feedbacks.
  • Figure 2: Framework of WESE. The left part represents the weak exploration and the right part represents the strong exploitation. We employ Llama-2-7B as the weak agent and text-davinci-003 as the strong agent in the implementation.
  • Figure 3: Relative improvements in success rate over various types of tasks on ALFWorld. The left tasks are more complicated.