Table of Contents
Fetching ...

LM Agents for Coordinating Multi-User Information Gathering

Harsh Jhamtani, Jacob Andreas, Benjamin Van Durme

TL;DR

PeopleJoin presents a benchmark to study LM driven coordination for multi-user information gathering with two tasks, QA and DocCreation, drawn from Spider and MultiNews within synthetic organizations of 2–20 members. The authors implement baseline LM agent architectures that operate via a reactive, action-observation-reflection loop and evaluate them with a comprehensive suite of metrics including answer correctness, Rouge-G-Eval summaries, communication efficiency, and source accuracy. Experiments with GPT-4 family models reveal that while LM agents can coordinate across collaborators, achieving high accuracy and efficiency remains challenging, especially in optimal collaborator selection and question formulation. The work establishes PeopleJoin as a test-bed for future research on learning from interaction, equitable task distribution, and privacy-aware, scalable AI-assisted collaboration at the organizational level.

Abstract

This paper introduces PeopleJoin, a benchmark for evaluating LM-mediated collaborative problem solving. Given a user request, PeopleJoin agents must identify teammates who might be able to assist, converse with these teammates to gather information, and finally compile a useful answer or summary for the original user. PeopleJoin comprises two evaluation domains: PeopleJoin-QA, focused on questions about tabular data, and PeopleJoin-DocCreation, focused on document creation tasks. The two domains are adapted from existing NLP benchmarks for database question answering and multi-document summarization; here, however, the information needed to complete these tasks is distributed across synthetic ``organizations'' of 2--20 users, simulating natural multi-user collaboration scenarios. We implemented several popular LM agent architectures, evaluating their accuracy and efficiency at completing tasks, and highlight new research questions that can be studied using PeopleJoin.

LM Agents for Coordinating Multi-User Information Gathering

TL;DR

PeopleJoin presents a benchmark to study LM driven coordination for multi-user information gathering with two tasks, QA and DocCreation, drawn from Spider and MultiNews within synthetic organizations of 2–20 members. The authors implement baseline LM agent architectures that operate via a reactive, action-observation-reflection loop and evaluate them with a comprehensive suite of metrics including answer correctness, Rouge-G-Eval summaries, communication efficiency, and source accuracy. Experiments with GPT-4 family models reveal that while LM agents can coordinate across collaborators, achieving high accuracy and efficiency remains challenging, especially in optimal collaborator selection and question formulation. The work establishes PeopleJoin as a test-bed for future research on learning from interaction, equitable task distribution, and privacy-aware, scalable AI-assisted collaboration at the organizational level.

Abstract

This paper introduces PeopleJoin, a benchmark for evaluating LM-mediated collaborative problem solving. Given a user request, PeopleJoin agents must identify teammates who might be able to assist, converse with these teammates to gather information, and finally compile a useful answer or summary for the original user. PeopleJoin comprises two evaluation domains: PeopleJoin-QA, focused on questions about tabular data, and PeopleJoin-DocCreation, focused on document creation tasks. The two domains are adapted from existing NLP benchmarks for database question answering and multi-document summarization; here, however, the information needed to complete these tasks is distributed across synthetic ``organizations'' of 2--20 users, simulating natural multi-user collaboration scenarios. We implemented several popular LM agent architectures, evaluating their accuracy and efficiency at completing tasks, and highlight new research questions that can be studied using PeopleJoin.

Paper Structure

This paper contains 37 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: A sequence diagram illustrating a conversation in PeopleJoin framework, where Alice issues a request to her agent. Documents available to Alice's agent are insufficient to answer the user request. The agent uses a people search tool, after which it decides what subset of people to contact, in which order, what questions to pose, etc. The temporal ordering of tool calls and message exchanges is denoted by #i.
  • Figure 2: Illustration of a transformation of a Spider datum into PeopleJoin-QA.