Algorithmic Prompt Generation for Diverse Human-like Teaming and Communication with Large Language Models
Siddharth Srikanth, Varun Bhatt, Boshen Zhang, Werner Hager, Charles Michael Lewis, Katia P. Sycara, Aaquib Tabrez, Stefanos Nikolaidis
TL;DR
Modeling diverse human-like teaming and communication in multi-agent settings is hindered by the cost of collecting large-scale data. PLAN-QD combines Quality Diversity with LLM-powered agents to algorithmically generate prompts that elicit diverse, human-like coordination in a long-horizon environment, optimizing the discounted return $J=\sum_t \gamma^t r_t$ while shaping diversity across a measure space ${\bm{m}}$. The work provides a human-subject study of Steakhouse-like teaming, introduces PLAN-QD as a prompt-generation framework, and offers empirical evidence that PLAN-QD reproduces observed communication effects and expands the space of behaviors beyond traditional prompting. This approach yields broader behavioral coverage than baselines, enabling scalable study and design of human-AI teams in complex tasks and informing safer, more effective human-AI collaboration.
Abstract
Understanding how humans collaborate and communicate in teams is essential for improving human-agent teaming and AI-assisted decision-making. However, relying solely on data from large-scale user studies is impractical due to logistical, ethical, and practical constraints, necessitating synthetic models of multiple diverse human behaviors. Recently, agents powered by Large Language Models (LLMs) have been shown to emulate human-like behavior in social settings. But, obtaining a large set of diverse behaviors requires manual effort in the form of designing prompts. On the other hand, Quality Diversity (QD) optimization has been shown to be capable of generating diverse Reinforcement Learning (RL) agent behavior. In this work, we combine QD optimization with LLM-powered agents to iteratively search for prompts that generate diverse team behavior in a long-horizon, multi-step collaborative environment. We first show, through a human-subjects experiment (n=54 participants), that humans exhibit diverse coordination and communication behavior in this domain. We then show that our approach can effectively replicate trends from human teaming data and also capture behaviors that are not easily observed without collecting large amounts of data. Our findings highlight the combination of QD and LLM-powered agents as an effective tool for studying teaming and communication strategies in multi-agent collaboration.
