Superhuman Game AI Disclosure: Expertise and Context Moderate Effects on Trust and Fairness
Jaymari Chua, Chen Wang, Lina Yao
TL;DR
This paper investigates how disclosing superhuman AI capabilities influences user trust, fairness, and toxicity in both adversarial (StarCraft II) and cooperative (LLM chat) settings. It introduces Persona Cards to generate synthetic, cognitively grounded user profiles and releases the Persona Cards Dataset to enable reproducible research on human–AI alignment. Across StarCraft II and LLM tasks, disclosures produced context- and expertise-dependent effects: novices often showed increased trust and reliance, while experts sometimes perceived greater toxicity or unfairness and adjusted their strategies accordingly. The work highlights that transparency is not a universal remedy; effective disclosure requires tailoring to user characteristics, domain norms, and explicit fairness objectives, and it offers practical design guidelines for adaptive transparency in AI systems.
Abstract
As artificial intelligence surpasses human performance in select tasks, disclosing superhuman capabilities poses distinct challenges for fairness, accountability, and trust. However, the impact of such disclosures on diverse user attitudes and behaviors remains unclear, particularly concerning potential negative reactions like discouragement or overreliance. This paper investigates these effects by utilizing Persona Cards: a validated, standardized set of synthetic personas designed to simulate diverse user reactions and fairness perspectives. We conducted an ethics board-approved study (N=32), utilizing these personas to investigate how capability disclosure influenced behaviors with a superhuman game AI in competitive StarCraft II scenarios. Our results reveal transparency is double-edged: while disclosure could alleviate suspicion, it also provoked frustration and strategic defeatism among novices in cooperative scenarios, as well as overreliance in competitive contexts. Experienced and competitive players interpreted disclosure as confirmation of an unbeatable opponent, shifting to suboptimal goals. We release the Persona Cards Dataset, including profiles, prompts, interaction logs, and protocols, to foster reproducible research into human alignment AI design. This work demonstrates that transparency is not a cure-all; successfully leveraging disclosure to enhance trust and accountability requires careful tailoring to user characteristics, domain norms, and specific fairness objectives.
