Table of Contents
Fetching ...

Prompt Optimization Across Multiple Agents for Representing Diverse Human Populations

Manh Hung Nguyen, Sebastian Tschiatschek, Adish Singla

TL;DR

The paper addresses the challenge that single LLMs often produce homogeneous outputs insufficient to represent diverse human populations. It introduces a framework to construct a representative ensemble of LLM agents, each guided by $K$ demonstrations through in-context learning, and formulates the agent-selection problem as a submodular optimization of prompts. Three scalable methods (RepPop_demo, RepPop_mapped_1, RepPop_mapped_2) offer different trade-offs between computation and performance, with theoretical guarantees and practical validation in education and crowdsourcing domains. Empirically, the ensemble of agents better captures population-wide behavior than baselines and generalizes to unseen tasks, enabling more faithful simulations of diverse human perspectives for research and evaluation.

Abstract

The difficulty and expense of obtaining large-scale human responses make Large Language Models (LLMs) an attractive alternative and a promising proxy for human behavior. However, prior work shows that LLMs often produce homogeneous outputs that fail to capture the rich diversity of human perspectives and behaviors. Thus, rather than trying to capture this diversity with a single LLM agent, we propose a novel framework to construct a set of agents that collectively capture the diversity of a given human population. Each agent is an LLM whose behavior is steered by conditioning on a small set of human demonstrations (task-response pairs) through in-context learning. The central challenge is therefore to select a representative set of LLM agents from the exponentially large space of possible agents. We tackle this selection problem from the lens of submodular optimization. In particular, we develop methods that offer different trade-offs regarding time complexity and performance guarantees. Extensive experiments in crowdsourcing and educational domains demonstrate that our approach constructs agents that more effectively represent human populations compared to baselines. Moreover, behavioral analyses on new tasks show that these agents reproduce the behavior patterns and perspectives of the students and annotators they are designed to represent.

Prompt Optimization Across Multiple Agents for Representing Diverse Human Populations

TL;DR

The paper addresses the challenge that single LLMs often produce homogeneous outputs insufficient to represent diverse human populations. It introduces a framework to construct a representative ensemble of LLM agents, each guided by demonstrations through in-context learning, and formulates the agent-selection problem as a submodular optimization of prompts. Three scalable methods (RepPop_demo, RepPop_mapped_1, RepPop_mapped_2) offer different trade-offs between computation and performance, with theoretical guarantees and practical validation in education and crowdsourcing domains. Empirically, the ensemble of agents better captures population-wide behavior than baselines and generalizes to unseen tasks, enabling more faithful simulations of diverse human perspectives for research and evaluation.

Abstract

The difficulty and expense of obtaining large-scale human responses make Large Language Models (LLMs) an attractive alternative and a promising proxy for human behavior. However, prior work shows that LLMs often produce homogeneous outputs that fail to capture the rich diversity of human perspectives and behaviors. Thus, rather than trying to capture this diversity with a single LLM agent, we propose a novel framework to construct a set of agents that collectively capture the diversity of a given human population. Each agent is an LLM whose behavior is steered by conditioning on a small set of human demonstrations (task-response pairs) through in-context learning. The central challenge is therefore to select a representative set of LLM agents from the exponentially large space of possible agents. We tackle this selection problem from the lens of submodular optimization. In particular, we develop methods that offer different trade-offs regarding time complexity and performance guarantees. Extensive experiments in crowdsourcing and educational domains demonstrate that our approach constructs agents that more effectively represent human populations compared to baselines. Moreover, behavioral analyses on new tasks show that these agents reproduce the behavior patterns and perspectives of the students and annotators they are designed to represent.

Paper Structure

This paper contains 29 sections, 3 theorems, 19 equations, 11 figures, 5 tables.

Key Result

Theorem 1

The problem of selecting an optimal subset $L^* \subseteq \mathcal{L}$ of size $M$ that maximizes $f(L)$ is NP-hard.

Figures (11)

  • Figure 1: Illustrative example of constructing a set of agents $L$ that is representative of a given human population $\mathcal{H}$. In this example, $\mathcal{H}$ is a group of diverse students working on a set of tasks $\mathcal{T}$ and providing answers. The goal is to create a set of agents $L$ that can accurately represent the students. The resulting agents exhibit different levels of understanding across mathematical concepts, with each agent corresponding to a group of students matched by skill level and task performance.
  • Figure 2: Examples of tasks in our experiments.
  • Figure 3: Representation error on test set. We show the representation error on the test set of each method with different number of agents. We report the means and standard errors (error bars) of three runs with different seeds. Our methods maintain lower representation error compared to baselines.
  • Figure 4: 2D embeddings of humans and agents constructed by $\textsc{RepPop}_{\text{mapped-2}}$ on tasks in $\mathcal{T}_{\text{train}}$ using UMAP. We provide examples of aggregated metadata (in the boxes) of humans represented by agents (connections are denoted by black arrows). They are not used for constructing agents and used only for analysis. Our method $\textsc{RepPop}_{\text{mapped-2}}$ constructs agents to cover different human behaviors, collectively (approximately) representing the human population. (a) EEDI:. Each agent represents a group of students with particular success rates on different Math concepts. (b) OpinionQA: Each agent represents a group of people with particular distributions of political ideologies, parties, and regions.
  • Figure 5: Representation error on EEDI dataset (Train).
  • ...and 6 more figures

Theorems & Definitions (6)

  • Theorem 1: NP-Hardness
  • Proposition 1: Submodularity of the Objective Function $f(L)$
  • Theorem 2: Performance Guarantee for $\textsc{RepPop}_{\text{mapped-1}}$ and $\textsc{RepPop}_{\text{mapped-2}}$
  • proof
  • proof
  • proof