Table of Contents
Fetching ...

Everyone prefers human writers, including AI

Wouter Haverals, Meredith Martin

TL;DR

This study investigates how attribution cues influence aesthetic judgments from both humans and AI evaluators using Queneau's Exercises in Style. Through two complementary experiments, it demonstrates a robust pro-human bias that is amplified in AI evaluators and persists across diverse architectures and styles. The findings reveal systematic inversion of evaluative criteria driven by provenance labels, suggesting that AI readers internalize human cultural biases about creativity and authorship. This has significant implications for how we design and interpret AI-based evaluation systems and for understanding the limits of algorithmic judgments in creative domains.

Abstract

As AI writing tools become widespread, we need to understand how both humans and machines evaluate literary style, a domain where objective standards are elusive and judgments are inherently subjective. We conducted controlled experiments using Raymond Queneau's Exercises in Style (1947) to measure attribution bias across evaluators. Study 1 compared human participants (N=556) and AI models (N=13) evaluating literary passages from Queneau versus GPT-4-generated versions under three conditions: blind, accurately labeled, and counterfactually labeled. Study 2 tested bias generalization across a 14$\times$14 matrix of AI evaluators and creators. Both studies revealed systematic pro-human attribution bias. Humans showed +13.7 percentage point (pp) bias (Cohen's h = 0.28, 95% CI: 0.21-0.34), while AI models showed +34.3 percentage point bias (h = 0.70, 95% CI: 0.65-0.76), a 2.5-fold stronger effect (P$<$0.001). Study 2 confirmed this bias operates across AI architectures (+25.8pp, 95% CI: 24.1-27.6%), demonstrating that AI systems systematically devalue creative content when labeled as "AI-generated" regardless of which AI created it. We also find that attribution labels cause evaluators to invert assessment criteria, with identical features receiving opposing evaluations based solely on perceived authorship. This suggests AI models have absorbed human cultural biases against artificial creativity during training. Our study represents the first controlled comparison of attribution bias between human and artificial evaluators in aesthetic judgment, revealing that AI systems not only replicate but amplify this human tendency.

Everyone prefers human writers, including AI

TL;DR

This study investigates how attribution cues influence aesthetic judgments from both humans and AI evaluators using Queneau's Exercises in Style. Through two complementary experiments, it demonstrates a robust pro-human bias that is amplified in AI evaluators and persists across diverse architectures and styles. The findings reveal systematic inversion of evaluative criteria driven by provenance labels, suggesting that AI readers internalize human cultural biases about creativity and authorship. This has significant implications for how we design and interpret AI-based evaluation systems and for understanding the limits of algorithmic judgments in creative domains.

Abstract

As AI writing tools become widespread, we need to understand how both humans and machines evaluate literary style, a domain where objective standards are elusive and judgments are inherently subjective. We conducted controlled experiments using Raymond Queneau's Exercises in Style (1947) to measure attribution bias across evaluators. Study 1 compared human participants (N=556) and AI models (N=13) evaluating literary passages from Queneau versus GPT-4-generated versions under three conditions: blind, accurately labeled, and counterfactually labeled. Study 2 tested bias generalization across a 1414 matrix of AI evaluators and creators. Both studies revealed systematic pro-human attribution bias. Humans showed +13.7 percentage point (pp) bias (Cohen's h = 0.28, 95% CI: 0.21-0.34), while AI models showed +34.3 percentage point bias (h = 0.70, 95% CI: 0.65-0.76), a 2.5-fold stronger effect (P0.001). Study 2 confirmed this bias operates across AI architectures (+25.8pp, 95% CI: 24.1-27.6%), demonstrating that AI systems systematically devalue creative content when labeled as "AI-generated" regardless of which AI created it. We also find that attribution labels cause evaluators to invert assessment criteria, with identical features receiving opposing evaluations based solely on perceived authorship. This suggests AI models have absorbed human cultural biases against artificial creativity during training. Our study represents the first controlled comparison of attribution bias between human and artificial evaluators in aesthetic judgment, revealing that AI systems not only replicate but amplify this human tendency.

Paper Structure

This paper contains 14 sections, 18 figures, 5 tables.

Figures (18)

  • Figure 1: Experimental design of Study 1 for testing attribution bias in literary style evaluation. The study used 30 matched pairs of stylistic exercises: human-authored versions from Queneau's Exercises in Style (translated by Barbara Wright) and AI-generated variants created by GPT-4 using minimal stylistic prompts. Participants were randomly assigned to one of three attribution conditions: blind (no authorship information shown), open-label (accurate attribution labels shown), or counterfactual (deliberately reversed labels where AI content is presented as human-authored and vice versa). Both human participants (N=556) and AI model evaluators (N=13 models across major providers) performed identical comparison tasks, selecting which passage better matched each target literary style. The between-subjects design with randomized presentation order controlled for position effects while measuring pure attribution bias through systematic label manipulation across identical textual stimuli.
  • Figure 2: Attribution bias in humans versus AI models. Humans (blue, N=2,780 responses) and AI models (red, N=3,488 aggregated responses from 13 models) evaluated literary passages under three conditions: blind (no labels), open-label (correct labels), and counterfactual (AI content mislabeled as human-authored). Y-axis shows AI content preference rate. Error bars = 95% CI; dashed line = no preference. Both show pro-human bias, with AI models exhibiting 2.5-fold stronger effects (+34.3pp vs +13.7pp, OR = 4.21 vs 1.75, both P$<$0.001).
  • Figure 3: Attribution bias across individual AI language models. Cohen's h effect sizes (with 95% confidence intervals) for attribution bias (counterfactual minus open-label conditions) across 13 individual AI models. The red dashed line indicates the human baseline (Cohen's h = 0.28 [CI: 0.21--0.34]). All models exhibit attribution bias exceeding the human baseline, with low between-model variance (CV = 0.42).
  • Figure 4: Style-specific susceptibility to authorship misinformation. Horizontal bars show percentage point changes in AI-generated content preference when this content was mislabeled as "Human-authored" versus correctly labeled as "AI-authored" across 30 Queneau literary styles. Blue bars = human evaluators; red bars = AI models. Positive values indicate increased preference (susceptible to misinformation); negative values indicate decreased preference (resistant to misinformation).
  • Figure 5: Cross-model attribution bias across AI architectures. Fourteen AI evaluator models judged literary content created by all 14 AI creators (N=17,596 responses across 196 evaluator-creator combinations) under three conditions: blind (no attribution labels), open-label (correct creator labels), and counterfactual (AI content mislabeled as human-authored). Y-axis shows AI content preference rate aggregated across all model combinations. Error bars = 95% CI; dashed line = no preference. Results demonstrate attribution bias across AI architectures (+25.8pp from open-label to counterfactual conditions, 95% CI: +24.1% to +27.6%, P$<$0.001).
  • ...and 13 more figures