Table of Contents
Fetching ...

PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods

Yiying Wang, Xiaojing Li, Binzhu Wang, Yueyang Zhou, Yingru Lin, Han Ji, Hong Chen, Jinshi Zhang, Fei Yu, Zewei Zhao, Song Jin, Renji Gong, Wanqing Xu

TL;DR

The PEER (Plan, Execute, Express, Review) multi-agent framework is introduced, which systematizes domain-specific tasks by integrating precise question decomposition, advanced information retrieval, comprehensive summarization, and rigorous self-assessment.

Abstract

In domain-specific applications, GPT-4, augmented with precise prompts or Retrieval-Augmented Generation (RAG), shows notable potential but faces the critical tri-lemma of performance, cost, and data privacy. High performance requires sophisticated processing techniques, yet managing multiple agents within a complex workflow often proves costly and challenging. To address this, we introduce the PEER (Plan, Execute, Express, Review) multi-agent framework. This systematizes domain-specific tasks by integrating precise question decomposition, advanced information retrieval, comprehensive summarization, and rigorous self-assessment. Given the concerns of cost and data privacy, enterprises are shifting from proprietary models like GPT-4 to custom models, striking a balance between cost, security, and performance. We developed industrial practices leveraging online data and user feedback for efficient model tuning. This study provides best practice guidelines for applying multi-agent systems in domain-specific problem-solving and implementing effective agent tuning strategies. Our empirical studies, particularly in the financial question-answering domain, demonstrate that our approach achieves 95.0% of GPT-4's performance, while effectively managing costs and ensuring data privacy.

PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods

TL;DR

The PEER (Plan, Execute, Express, Review) multi-agent framework is introduced, which systematizes domain-specific tasks by integrating precise question decomposition, advanced information retrieval, comprehensive summarization, and rigorous self-assessment.

Abstract

In domain-specific applications, GPT-4, augmented with precise prompts or Retrieval-Augmented Generation (RAG), shows notable potential but faces the critical tri-lemma of performance, cost, and data privacy. High performance requires sophisticated processing techniques, yet managing multiple agents within a complex workflow often proves costly and challenging. To address this, we introduce the PEER (Plan, Execute, Express, Review) multi-agent framework. This systematizes domain-specific tasks by integrating precise question decomposition, advanced information retrieval, comprehensive summarization, and rigorous self-assessment. Given the concerns of cost and data privacy, enterprises are shifting from proprietary models like GPT-4 to custom models, striking a balance between cost, security, and performance. We developed industrial practices leveraging online data and user feedback for efficient model tuning. This study provides best practice guidelines for applying multi-agent systems in domain-specific problem-solving and implementing effective agent tuning strategies. Our empirical studies, particularly in the financial question-answering domain, demonstrate that our approach achieves 95.0% of GPT-4's performance, while effectively managing costs and ensuring data privacy.
Paper Structure (17 sections, 2 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 17 sections, 2 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: Cyclic Workflow Diagram of the PEER Framework. The user's query, "Why did Buffett sell BYD stock?", prompts the "Plan" agent to generate four relevant sub-questions. The "Execute" agent then collects information, including BYD's financial data and expert opinions. The "Express" agent synthesizes a comprehensive answer, which the "Review" agent evaluates and, if necessary, suggests modifications.
  • Figure 2: Iterative Training Process: Initially, Model 0 is trained on offline data. This model then generates two sets of predictions: one to create training data for the next iteration (upper section) and another to provide evaluation results for the current iteration (lower section). This cycle is repeated iteratively across subsequent training phases.
  • Figure 3: Win rate of PEER framework. PEER performs better than BabyAGI and PEE under both base models.
  • Figure 4: Win rate of tuned-agent. Both DPO and SFT show progress in each iteration and DPO converges faster than SFT.