Table of Contents
Fetching ...

Humanlike AI Design Increases Anthropomorphism but Yields Divergent Outcomes on Engagement and Trust Globally

Robin Schimmelpfennig, Mark Díaz, Vinodkumar Prabhakaran, Aida Davani

TL;DR

The paper investigates how humanlike AI design shapes anthropomorphism, engagement, and trust across diverse cultural contexts. Through two large, cross-national studies (N=3,500 total across 10 countries) using open-ended interactions with a state-of-the-art chatbot, the authors identify that users prioritize applied, interactional cues over abstract theory when judging humanlikeness. Experimentally manipulating design characteristics (DC) and conversational sociability (CS) increases anthropomorphism, but downstream effects on trust and engagement are culturally contingent, with significant heterogeneity across populations. The findings challenge universal risk assumptions about humanlike AI, arguing for culturally adaptive governance that considers local user contexts and actual interaction dynamics.

Abstract

Over a billion users across the globe interact with AI systems engineered with increasing sophistication to mimic human traits. This shift has triggered urgent debate regarding Anthropomorphism, the attribution of human characteristics to synthetic agents, and its potential to induce misplaced trust or emotional dependency. However, the causal link between more humanlike AI design and subsequent effects on engagement and trust has not been tested in realistic human-AI interactions with a global user pool. Prevailing safety frameworks continue to rely on theoretical assumptions derived from Western populations, overlooking the global diversity of AI users. Here, we address these gaps through two large-scale cross-national experiments (N=3,500) across 10 diverse nations, involving real-time and open-ended interactions with an AI system. We find that when evaluating an AI's human-likeness, users focus less on the kind of theoretical aspects often cited in policy (e.g., sentience or consciousness), but rather applied, interactional cues like conversation flow or understanding the user's perspective. We also experimentally demonstrate that humanlike design levers can causally increase anthropomorphism among users; however, we do not find that humanlike design universally increases behavioral measures for user engagement and trust, as previous theoretical work suggests. Instead, part of the connection between human-likeness and behavioral outcomes is fractured by culture: specific design choices that foster self-reported trust in AI-systems in some populations (e.g., Brazil) may trigger the opposite result in others (e.g., Japan). Our findings challenge prevailing narratives of inherent risk in humanlike AI design. Instead, we identify a nuanced, culturally mediated landscape of human-AI interaction, which demands that we move beyond a one-size-fits-all approach in AI governance.

Humanlike AI Design Increases Anthropomorphism but Yields Divergent Outcomes on Engagement and Trust Globally

TL;DR

The paper investigates how humanlike AI design shapes anthropomorphism, engagement, and trust across diverse cultural contexts. Through two large, cross-national studies (N=3,500 total across 10 countries) using open-ended interactions with a state-of-the-art chatbot, the authors identify that users prioritize applied, interactional cues over abstract theory when judging humanlikeness. Experimentally manipulating design characteristics (DC) and conversational sociability (CS) increases anthropomorphism, but downstream effects on trust and engagement are culturally contingent, with significant heterogeneity across populations. The findings challenge universal risk assumptions about humanlike AI, arguing for culturally adaptive governance that considers local user contexts and actual interaction dynamics.

Abstract

Over a billion users across the globe interact with AI systems engineered with increasing sophistication to mimic human traits. This shift has triggered urgent debate regarding Anthropomorphism, the attribution of human characteristics to synthetic agents, and its potential to induce misplaced trust or emotional dependency. However, the causal link between more humanlike AI design and subsequent effects on engagement and trust has not been tested in realistic human-AI interactions with a global user pool. Prevailing safety frameworks continue to rely on theoretical assumptions derived from Western populations, overlooking the global diversity of AI users. Here, we address these gaps through two large-scale cross-national experiments (N=3,500) across 10 diverse nations, involving real-time and open-ended interactions with an AI system. We find that when evaluating an AI's human-likeness, users focus less on the kind of theoretical aspects often cited in policy (e.g., sentience or consciousness), but rather applied, interactional cues like conversation flow or understanding the user's perspective. We also experimentally demonstrate that humanlike design levers can causally increase anthropomorphism among users; however, we do not find that humanlike design universally increases behavioral measures for user engagement and trust, as previous theoretical work suggests. Instead, part of the connection between human-likeness and behavioral outcomes is fractured by culture: specific design choices that foster self-reported trust in AI-systems in some populations (e.g., Brazil) may trigger the opposite result in others (e.g., Japan). Our findings challenge prevailing narratives of inherent risk in humanlike AI design. Instead, we identify a nuanced, culturally mediated landscape of human-AI interaction, which demands that we move beyond a one-size-fits-all approach in AI governance.

Paper Structure

This paper contains 24 sections, 8 figures.

Figures (8)

  • Figure 1: Two-stage experimental design for measuring AI anthropomorphism and its downstream affect across user groups. In both studies, participants first engage in an open-ended, multi-turn interaction with a chatbot (GPT-4o, August 2024), followed by questionnaires and behavioral tasks (for Study 2).
  • Figure 2: Prevalence of anthropomorphism across measured attributes. The figure shows the share of users whose response shows tendency to anthropomorphize across several attributes. To aid interpretation, responses from the original 5-point scale (e.g., (1) "completely machine-like" to (5) "completely humanlike" for the "machine-like/humanlike" item) are collapsed into three categories. Scores of 1 and 2 are grouped as "Not Anthropomorphizing", scores of 4 and 5 are grouped as "Anthropomorphizing" and the score of 3 is shown as "Neutral". Attributes are sorted in ascending order by the total ratio of "Anthropomorphizing" responses. The results show that for every characteristic, the largest response group was "Anthropomorphizing".
  • Figure 3: Cross-national variation in human-like perception and preference.The figure shows mean and distribution of 'humanlike' perception and preference by country. Responses were measured on a bipolar 5-point Likert scale (1="Completely machine-like" to 5="Completely humanlike"). Each row represents one country. The faint blue shaded area is a violin plot, illustrating the full distribution of all responses. The black dot indicates the mean score, and the horizontal bars represent the 95% confidence interval (CI) of the mean. The dashed vertical line indicates the neutral midpoint (3.0) of the scale. Blue arrows start at each country's "Humanlike" perception mean and encode preference relative to neutral: they point right when the country's average preference for human-likeness is above 3 (more humanlike) and left when it is below 3 (more machine-like); arrow length is proportional to the difference between the mean preference score and the neutral midpoint ("Would you prefer to talk to an AI system that is less or more humanlike compared to the one you just talked to?"), and uniformly scaled for readability. Sample sizes were N=100 for each country, with the exception of the United States (N=200; total N=1,100).
  • Figure 4: Frequency of user-identified 'applied' and 'theoretical' aspects of AI anthropomorphism Bars represent the percentage of participants ($N=1,100$) mentioning specific aspects in response to an open-ended prompt regarding the chatbot's human-likeness ("Was there something specific about the AI system that made you feel you were (or were not) talking to a human? If so, why? Please describe in detail."). Applied aspects (highlighted in purple) were identified through a bottom-up qualitative analysis of user feedback and subsequently used to develop our codebook. Theoretical aspects (shaded in gray) represent top-down constructs derived from established anthropomorphism scales. An LLM-based autorater identified these features across eight languages. The results indicate that users prioritize applied, interactional characteristics (e.g., conversation flow, perspective-taking) over abstract theoretical constructs (e.g., consciousness, possession of a soul) when evaluating human-likeness.
  • Figure 5: Effect of human-like AI design treatments on anthropomorphism. Points represent the coefficient estimates from a series of Ordinary Least Squares (OLS) regressions (showing coefficients for 'DC-high/CS-high' vs. 'DC-low/CS-low') and horizontal lines represent the 95% confidence intervals. Panel a) shows the treatment effects across all ten Likert-measured anthropomorphism items. Panel b) shows the heterogeneity analysis, specifically for the "humanlike" item, across all sampled countries.
  • ...and 3 more figures