Table of Contents
Fetching ...

Programmable Cognitive Bias in Social Agents

Xuan Liu, Haoyang Shang, Haojian Jin

TL;DR

CoBRA introduces a principled toolkit for programmable cognitive bias in LLM-based social agents, replacing opaque implicit prompts with a Cognitive Bias Index grounded in classic social experiments. Through its closed-loop, ground-truth calibration (CBI) and a three-space Behavioral Regulation Engine (Prompt Numerical Control, Representation Engineering, and LoRA-based fine-tuning), CoBRA achieves reproducible, tunable bias across models, temperatures, and reasoning modes. Technical benchmarks show robust cross-model reproducibility and controllability, while a demonstration with emotional contagion validates a clear dose-response relationship between programmed bias and emergent behavior. This approach enables rigorous, theory-driven social simulations with broad implications for research, policy testing, and ethically guided AI deployment.

Abstract

This paper introduces CoBRA, a novel toolkit for systematically specifying agent behavior in LLM-based social simulation. We found that conventional approaches that specify agent behaviors through implicit natural language descriptions cannot yield consistent behaviors across models, and the produced agent behaviors do not capture the nuances of the descriptions. In contrast, CoBRA presents a new approach to program agents' cognitive biases explicitly, by grounding agents' expected behaviors using classic social science experiments. CoBRA has two components: (1) Cognitive Bias Index that measures the cognitive bias of a social agent, by quantifying the agent's reactions in a set of validated classical social science experiments; (2) Behavioral Regulation Engine that aligns the agent's behavior to demonstrate controlled cognitive bias. We evaluated CoBRA as an HCI toolkit through demonstration and technical benchmarks. Our results suggest that CoBRA can precisely program the cognitive bias demonstrated in a social agent in a model-agnostic manner.

Programmable Cognitive Bias in Social Agents

TL;DR

CoBRA introduces a principled toolkit for programmable cognitive bias in LLM-based social agents, replacing opaque implicit prompts with a Cognitive Bias Index grounded in classic social experiments. Through its closed-loop, ground-truth calibration (CBI) and a three-space Behavioral Regulation Engine (Prompt Numerical Control, Representation Engineering, and LoRA-based fine-tuning), CoBRA achieves reproducible, tunable bias across models, temperatures, and reasoning modes. Technical benchmarks show robust cross-model reproducibility and controllability, while a demonstration with emotional contagion validates a clear dose-response relationship between programmed bias and emergent behavior. This approach enables rigorous, theory-driven social simulations with broad implications for research, policy testing, and ethically guided AI deployment.

Abstract

This paper introduces CoBRA, a novel toolkit for systematically specifying agent behavior in LLM-based social simulation. We found that conventional approaches that specify agent behaviors through implicit natural language descriptions cannot yield consistent behaviors across models, and the produced agent behaviors do not capture the nuances of the descriptions. In contrast, CoBRA presents a new approach to program agents' cognitive biases explicitly, by grounding agents' expected behaviors using classic social science experiments. CoBRA has two components: (1) Cognitive Bias Index that measures the cognitive bias of a social agent, by quantifying the agent's reactions in a set of validated classical social science experiments; (2) Behavioral Regulation Engine that aligns the agent's behavior to demonstrate controlled cognitive bias. We evaluated CoBRA as an HCI toolkit through demonstration and technical benchmarks. Our results suggest that CoBRA can precisely program the cognitive bias demonstrated in a social agent in a model-agnostic manner.

Paper Structure

This paper contains 51 sections, 16 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Existing social simulation experiments often use implicit natural language descriptions to specify agent behaviors. However, we found that these specifications often resulted in inconsistent and unpredictable agent behaviors. For example, Ⓐ real-world economists are supposed to be less susceptible to the Framing Effect than the general population Nudge2008; however, Ⓑ agents based on implicit natural language specifications often produce inconsistent behaviors across models, and the expected differences in behavior across roles (e.g., economists being less prone than laypeople) are not reliably observed. To tackle this challenge, Ⓒ we introduce CoBRA which enables researchers to explicitly specify the cognitive biases of LLM-based agents quantitatively, thereby producing precise and consistent behaviors across models.
  • Figure 2: Example closed-loop workflow of CoBRA. A social scientist aims to create an agent with a moderate framing effect (e.g., 2.6 on a 0–4 scale). ① She specifies the desired bias level in CoBRA alongside the natural language agent description. ② CoBRA measures the agent’s framing effect using validated classical social science experiments (e.g., the Asian Disease study Asian_Disease). ③ If the measured bias deviates from the specification, the Behavioral Regulation Engine iteratively adjusts the agent—through prompt engineering, activation modifications, or fine-tuning—until the agent reliably demonstrates the target bias.
  • Figure 3: The example of persona-based specification Gen_Agent failed to produce consistent agent behavior regarding the framing effect in the Asian Disease paradigm. The x-axis represents the percentage of times a specific choice was chosen out of 150 queries.
  • Figure 4: The example of role-based specification agentverse_iclr2024 failed to produce consistent agent behavior regarding the framing effect in the Asian Disease paradigm. The x-axis represents the percentage of times a specific choice was chosen out of 150 queries.
  • Figure 5: Classic Social Experiment Testbed. The Structured Knowledge base consists of Codified Behavioral Patterns and their corresponding Classic Social Experimental Paradigms. Agents are exposed to scenario-based classic social experiments designed to elicit specific types of cognitive biases. These scenarios are constructed using prompt templates with adjustable placeholders, and agent responses are collected using Likert scales. Based on these responses, a Cognitive Bias Index (See section \ref{['sec:CBI']}) is computed to quantify agent behavior. $P(A) - P(E)$: For open-source models, we directly use the probability of the agent choosing different options; for closed-source models, we estimate by querying the model multiple times and using the observed frequencies.
  • ...and 10 more figures

Theorems & Definitions (1)

  • Definition 1: Cognitive Bias Index