Overcoming the Machine Penalty with Imperfectly Fair AI Agents

Zhen Wang; Ruiqi Song; Chen Shen; Shiya Yin; Zhao Song; Balaraju Battu; Lei Shi; Danyang Jia; Talal Rahwan; Shuyue Hu

Overcoming the Machine Penalty with Imperfectly Fair AI Agents

Zhen Wang, Ruiqi Song, Chen Shen, Shiya Yin, Zhao Song, Balaraju Battu, Lei Shi, Danyang Jia, Talal Rahwan, Shuyue Hu

TL;DR

This paper tackles the longstanding machine penalty by testing whether AI agents powered by large language models can foster human cooperation in social dilemmas. By assigning AI agents three personas—cooperative, fair, and selfish—and conducting a large preregistered study with 1,152 participants in a ten-round prisoner's dilemma with pre-game communication, the authors show that only the fair persona overcomes the machine penalty, achieving cooperation rates comparable to human–human interactions. The results reveal that fair agents promote cooperative norms, even when they occasionally break promises, and are perceived by humans as possessing agency and intelligence similar to or greater than humans in certain dimensions. The study emphasizes that success hinges on embedding AI with human-like social-cognitive intelligence rather than superficial anthropomorphism, with broad implications for designing AI that can effectively partner with humans in complex social contexts.

Abstract

Despite rapid technological progress, effective human-machine cooperation remains a significant challenge. Humans tend to cooperate less with machines than with fellow humans, a phenomenon known as the machine penalty. Here, we show that artificial intelligence (AI) agents powered by large language models can overcome this penalty in social dilemma games with communication. In a pre-registered experiment with 1,152 participants, we deploy AI agents exhibiting three distinct personas: selfish, cooperative, and fair. However, only fair agents elicit human cooperation at rates comparable to human-human interactions. Analysis reveals that fair agents, similar to human participants, occasionally break pre-game cooperation promises, but nonetheless effectively establish cooperation as a social norm. These results challenge the conventional wisdom of machines as altruistic assistants or rational actors. Instead, our study highlights the importance of AI agents reflecting the nuanced complexity of human social behaviors -- imperfect yet driven by deeper social cognitive processes.

Overcoming the Machine Penalty with Imperfectly Fair AI Agents

TL;DR

Abstract

Paper Structure (8 sections, 29 figures, 13 tables)

This paper contains 8 sections, 29 figures, 13 tables.

Introduction
Results
Discussion
Methods
System Prompt
Role-play Prompt
Communication Prompt
Decision-Making Prompt

Figures (29)

Figure 1: Fair agents, unlike cooperative or selfish agents, are as effective as humans at eliciting human cooperation, thereby overcoming the machine penalty. The left panel depicts participants' cooperation rates, while the right panel depicts the cooperation rates of agents. Participants' cooperation rates in the H-F treatment show no significant difference compared to those in the H-H treatment ($W=11096$, $p=0.3$, Cohen's $d=-0.11$). However, their cooperation rates in both the H-C and H-S treatments are significantly lower than those in the H-H treatment (H-C vs. H-H: $W=7240.5$, $p<10^{-6}$, Cohen's $d =-0.52$; H-S vs. H-H: $W=5552.5$, $p<10^{-12}$, Cohen's $d=0.97$). The cooperation rates of fair agents are significantly lower than those of cooperative agents ($W=4089$, $p<10^{-16}$, Cohen's $d = -1.32$), but significantly higher than those of selfish agents ($W=20680$, $p<10^{-16}$, Cohen's $d = 4.29$). Two-tailed Mann–Whitney $U$ tests are used for pairwise comparisons. The robustness of these results is further corroborated by a one-way ANOVA test (SI, Table S1).
Figure 2: All three types of agents frequently establish cooperation agreements with humans during the pre-game communication. However, humans often break cooperation promises, while fair agents also occasionally do so. Participants are most likely to establish the cooperation agreements with fair agents, at a significantly higher rate than participants in all the other treatments (H-F vs. H-H: $\chi^{2}=20.3$, $p<10^{-5}$, Cohen's $h=0.20$; H-F vs. H-C: $\chi^{2}=22.7$, $p<10^{-5}$, Cohen's $h=0.18$; H-F vs. H-S: $\chi^{2}=30.8$, $p<10^{-7}$, Cohen's $h=0.21$). However, during the games, participants typically break their promises, though they break promises significantly less frequently in the H-F treatment compared to the H-C and H-S treatments (H-F vs. H-C: $\chi^{2}=32.95$, $p<10^{-8}$, Cohen's $h=-0.24$; H-F vs. H-S: $\chi^{2}=135.8$, $p<10^{-15}$, Cohen's $h=-0.49$). Fair agents break promises at a significantly higher rate than cooperative agents ($\chi^{2}=115.93$, $p<10^{-15}$, Cohen's $h=0.63$), but significantly lower than selfish agents ($\chi^{2}=968.9$, $p<10^{-15}$, Cohen's $h=-1.39$). Two-sample proportions $Z$ tests are used for pairwise comparisons. Statistical significance results of pairwise comparisons across each treatment are provided in SI, Tables S2.
Figure 3: Occasional promise breaches, exhibited by fair agents, are associated with the highest rates of human cooperation. Scatter points depict the cooperation rates of individual participants when interacting with agents. The curve represents a generalized linear model (GLM) that incorporates data from all three types of human-agent interactions. This model treats human cooperation rates as the dependent variable, and includes linear ($\text{Estimate} \pm \text{SE} = 2.87 \pm 1.28, z = 2.2, p = 0.02$), quadratic ($\text{Estimate} \pm \text{SE} = -9.65 \pm 3.37, z = -2.86, p < 0.01$), and cubic ($\text{Estimate} \pm \text{SE} = 5.96 \pm 2.37, z = 2.52, p =0.01$) terms of agents promise-breaking frequency as independent variables. The curve shows an initial increase in human cooperation rates as the frequency of agents promise-breaking rises from zero, followed by a significant decrease, and then stabilization at the higher frequency of agents promise-breaking.
Figure 4: Fair agents establish cooperative norms and are perceived as possessing experience, agency, and intelligence, while also being viewed as more trustworthy, likable, cooperative, and fair than humans. The top panels depict participants' post-experiment estimations for cooperation from other participants in the same treatment, whereas the bottom panels depict participants' post-experiment agreement levels for various human-like traits of their associates in the treatment. Participants estimate the highest level of cooperation from other participants in the H-F treatment than in all the other treatments (H-F vs. H-C: $W=13356$, $p<10^{-4}$, Cohen's $d=0.52$; H-F vs. H-H: $W=6786.5$, $p<10^{-6}$, Cohen's $d=0.65$; H-F vs. H-S: $W=16822$, $p<10^{-15}$, Cohen's $d=1.37$). Compared to humans, fair agents fall short in experience ($W=13373$, $p<10^{-4}$, Cohen's $d=-0.46$), but exhibit similar intelligence ($W=11465$, $p=0.11$, Cohen's $d=-0.12$) and agency ($W=9173$, $p=0.09$, Cohen's $d=0.23$). In addition, they are seen as more trustworthy ($W=4378.5$, $p<10^{-17}$, Cohen's $d=1.16$), likable ($W=5047.5$, $p<10^{-13}$, Cohen's $d=0.99$), fair ($W=5721.5$, $p<10^{-10}$, Cohen's $d=0.86$), and cooperative ($W=4145.5$, $p<10^{-18}$, Cohen's $d=1.22$) than humans. Two-tailed Mann–Whitney $U$ tests are used for pairwise comparisons. Statistical significance results of pairwise comparisons across each treatment and each dimension are provided in SI, Tables S3.
Figure S1: Messages generated by fair agents are all perceived as high quality and are viewed more positively in clarity, concreteness and courteousness than those from humans under the label-informed setting. Box plot depicts participants' post-experiment agreement levels for associates' communication quality according to the 7C standard, namely, clarity, conciseness, concreteness, coherence, courteousness, correctness, and completeness. Compared to humans, fair agents generate messages with similar levels of conciseness ($W=11468$, $p=0.11$, Cohen's $d=-0.14$), coherence ($W=9462.5$, $p=0.19$, Cohen's $d=0.23$), correctness ($W=11125$, $p=0.27$, Cohen's $d=-0.11$), and completeness ($W=9214.5$, $p=0.09$, Cohen's $d=0.27$). In addition, the messages generated by fair agents are perceived as having greater clarity ($W=8738$, $p=0.02$, Cohen's $d=0.29$), concreteness ($W=7328.5$, $p<10^{-5}$, Cohen's $d=0.64$), and courteousness ($W=7939$, $p<10^{-3}$, Cohen's $d=0.52$) than those produced by humans. Two-tailed Mann–Whitney $U$ tests are used for pairwise comparisons. Statistical significance results of pairwise comparisons across each treatment and each dimension are provided in Tables \ref{['communication label-informed']}.
...and 24 more figures

Overcoming the Machine Penalty with Imperfectly Fair AI Agents

TL;DR

Abstract

Overcoming the Machine Penalty with Imperfectly Fair AI Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (29)