LM Agents for Coordinating Multi-User Information Gathering
Harsh Jhamtani, Jacob Andreas, Benjamin Van Durme
TL;DR
PeopleJoin presents a benchmark to study LM driven coordination for multi-user information gathering with two tasks, QA and DocCreation, drawn from Spider and MultiNews within synthetic organizations of 2–20 members. The authors implement baseline LM agent architectures that operate via a reactive, action-observation-reflection loop and evaluate them with a comprehensive suite of metrics including answer correctness, Rouge-G-Eval summaries, communication efficiency, and source accuracy. Experiments with GPT-4 family models reveal that while LM agents can coordinate across collaborators, achieving high accuracy and efficiency remains challenging, especially in optimal collaborator selection and question formulation. The work establishes PeopleJoin as a test-bed for future research on learning from interaction, equitable task distribution, and privacy-aware, scalable AI-assisted collaboration at the organizational level.
Abstract
This paper introduces PeopleJoin, a benchmark for evaluating LM-mediated collaborative problem solving. Given a user request, PeopleJoin agents must identify teammates who might be able to assist, converse with these teammates to gather information, and finally compile a useful answer or summary for the original user. PeopleJoin comprises two evaluation domains: PeopleJoin-QA, focused on questions about tabular data, and PeopleJoin-DocCreation, focused on document creation tasks. The two domains are adapted from existing NLP benchmarks for database question answering and multi-document summarization; here, however, the information needed to complete these tasks is distributed across synthetic ``organizations'' of 2--20 users, simulating natural multi-user collaboration scenarios. We implemented several popular LM agent architectures, evaluating their accuracy and efficiency at completing tasks, and highlight new research questions that can be studied using PeopleJoin.
