Table of Contents
Fetching ...

Causal Effects with Unobserved Unit Types in Interacting Human-AI Systems

William Overman, Sadegh Shirani, Mohsen Bayati

TL;DR

This work studies experiments on interacting populations of humans and AI agents, where both unit types and the interaction network remain unobserved, and shows that by constructing subpopulations that vary in expected human composition and treatment exposure, one can consistently recover human-specific causal effects.

Abstract

We study experiments on interacting populations of humans and AI agents, where both unit types and the interaction network remain unobserved. Although causal effects propagate throughout the system, the goal is to estimate effects on humans. Examples include online platforms where human users interact alongside AI-driven accounts. We assume a human-AI prior that gives each unit a probability of being human. While humans cannot be distinguished at the unit level, the prior allows us to compute the average human composition within large subpopulations. We then model outcome dynamics through a causal message passing (CMP) framework and analyze sample-mean outcomes across subpopulations. We show that by constructing subpopulations that vary in expected human composition and treatment exposure, one can consistently recover human-specific causal effects. Our results characterize when distributional knowledge of population composition (without observing unit types or the interaction network) is sufficient for identification. We validate the approach on a simulated human-AI platform driven by behaviorally differentiated LLM agents. Together, these results provide a theoretical and practical framework for experimentation in emerging human-AI systems.

Causal Effects with Unobserved Unit Types in Interacting Human-AI Systems

TL;DR

This work studies experiments on interacting populations of humans and AI agents, where both unit types and the interaction network remain unobserved, and shows that by constructing subpopulations that vary in expected human composition and treatment exposure, one can consistently recover human-specific causal effects.

Abstract

We study experiments on interacting populations of humans and AI agents, where both unit types and the interaction network remain unobserved. Although causal effects propagate throughout the system, the goal is to estimate effects on humans. Examples include online platforms where human users interact alongside AI-driven accounts. We assume a human-AI prior that gives each unit a probability of being human. While humans cannot be distinguished at the unit level, the prior allows us to compute the average human composition within large subpopulations. We then model outcome dynamics through a causal message passing (CMP) framework and analyze sample-mean outcomes across subpopulations. We show that by constructing subpopulations that vary in expected human composition and treatment exposure, one can consistently recover human-specific causal effects. Our results characterize when distributional knowledge of population composition (without observing unit types or the interaction network) is sufficient for identification. We validate the approach on a simulated human-AI platform driven by behaviorally differentiated LLM agents. Together, these results provide a theoretical and practical framework for experimentation in emerging human-AI systems.
Paper Structure (46 sections, 2 theorems, 22 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 46 sections, 2 theorems, 22 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Consider the outcome dynamics eq:outcome_specification_CMP and suppose that Assumptions asmp:interference and asmp:SE_regular hold. Assume a Bernoulli randomized design: for each $i$ and $t$, independently across $i$ and $t$, and independent of all other sources of randomness in the model, where $\pi^{}_{t} \in [0,1]$. Let $\mathsf{S}^{} \subseteq [N]$ be a subpopulation with $\left|{ \mathsf{S}^

Figures (2)

  • Figure 1: Overview of the proposed framework. (a) A mixed population of humans and AI agents with unobserved unit types; each unit $i$ is associated with a known prior $Q_i \in [0,1]$ representing the probability of being human, and treatment is assigned at random. (b) Subpopulations are constructed by stratifying units along two axes: expected human composition (horizontal) and treatment exposure (vertical), yielding groups with systematic variation in both dimensions. (c) The experimental state evolution (ESE) is fitted to the aggregate outcome trajectories of these subpopulations, then used to project counterfactual outcomes under full treatment and full control with composition set to $q^S = 1$ (human-only). The difference between these projected trajectories yields the estimated human total treatment effect (H-TTE).
  • Figure 2: Estimated human total treatment effect (H-TTE) on engagement ($Y_{i,t} \in \{0,\ldots,4\}$) across 16 rounds (4 warmup + 12 main). Algorithm \ref{['alg:tte_estimation']} is shown under three prior-quality configurations ($a \in \{0.7, 0.8, 0.9\}$, $\sigma = 0.15$), alongside five baselines. The ground-truth human TTE (black) stabilizes near $0.5$, while baseline estimators remain near zero due to cancellation between positive human and negative AI responses and network interference. Lines show means and shaded bands denote $\pm 1$ standard error over 10 seeds.

Theorems & Definitions (4)

  • Theorem 1: Experimental State Evolution -- ESE
  • Remark 1
  • Remark 2
  • Theorem 2: Consistency of Algorithm \ref{['alg:tte_estimation']}