Table of Contents
Fetching ...

Human-AI Synergy Supports Collective Creative Search

Chenyi Li, Raja Marjieh, Haoyu Hu, Mark Steyvers, Katherine M. Collins, Ilia Sucholutsky, Nori Jacoby

TL;DR

A controlled word-guessing task that balances open-endedness with an objective measure of task performance is studied, suggesting higher-order interaction effects, whereby agents adapt to each other's presence in hybrid groups.

Abstract

Generative AI is increasingly transforming creativity into a hybrid human-artificial process, but its impact on the quality and diversity of creative output remains unclear. We study collective creativity using a controlled word-guessing task that balances open-endedness with an objective measure of task performance. Participants attempt to infer a hidden target word, scored based on the semantic similarity of their guesses to the target, while also observing the best guess from previous players. We compare performance and outcome diversity across human-only, AI-only, and hybrid human-AI groups. Hybrid groups achieve the highest performance while preserving high diversity of guesses. Within hybrid groups, both humans and AI agents systematically adjust their strategies relative to single-agent conditions, suggesting higher-order interaction effects, whereby agents adapt to each other's presence. Although some performance benefits can be reproduced through collaboration between heterogeneous AI systems, human-AI collaboration remains superior, underscoring complementary roles in collective creativity.

Human-AI Synergy Supports Collective Creative Search

TL;DR

A controlled word-guessing task that balances open-endedness with an objective measure of task performance is studied, suggesting higher-order interaction effects, whereby agents adapt to each other's presence in hybrid groups.

Abstract

Generative AI is increasingly transforming creativity into a hybrid human-artificial process, but its impact on the quality and diversity of creative output remains unclear. We study collective creativity using a controlled word-guessing task that balances open-endedness with an objective measure of task performance. Participants attempt to infer a hidden target word, scored based on the semantic similarity of their guesses to the target, while also observing the best guess from previous players. We compare performance and outcome diversity across human-only, AI-only, and hybrid human-AI groups. Hybrid groups achieve the highest performance while preserving high diversity of guesses. Within hybrid groups, both humans and AI agents systematically adjust their strategies relative to single-agent conditions, suggesting higher-order interaction effects, whereby agents adapt to each other's presence. Although some performance benefits can be reproduced through collaboration between heterogeneous AI systems, human-AI collaboration remains superior, underscoring complementary roles in collective creativity.
Paper Structure (16 sections, 8 figures)

This paper contains 16 sections, 8 figures.

Figures (8)

  • Figure 1: Experiment framework for collective creative search. (A) Participants attempt to infer a hidden target word ("satellite") and receive similarity score feedback over multiple turns. (B) The similarity score of each guessed word is computed by the product of the hidden word score and the cosine similarity between them. (C) In each round, the participants received the best guess from previous rounds as a hint. (D) Participants were embedded in a collective guessing game with a chain-like network, where best-guess information was transmitted within each game (chain). Each game had 10 rounds of 10 turns each, totaling 100 guesses per game. (E) Schematic of the different experimental conditions considered.
  • Figure 2: Semantic exploration trajectories. All words were embedded with a Word2Vec model and projected to a two-dimensional space using UMAP. We present one target word ("compass"). Colored trajectories show five games for each of the four conditions. For each game and each round, we computed the average coordinates of the 10 guesses and connected these round centroids from rounds 1 to 10; line opacity increases with round index. Inset boxes display example rounds showing all 10 guesses with their similarity scores; best guesses of the rounds are highlighted.
  • Figure 3: Performance and diversity. A. Individual performance, computed as the average of the maximal score across rounds. Error bars represent one standard error across participants. Asterisks *, **, and *** denote significance levels of 0.05, 0.01, and 0.001, respectively. To account for multiple comparisons (here and in the rest of the paper), only results that passed the Benjamini–Hochberg False Discovery Rate (BH-FDR) correction are shown. B. Individual diversity, computed as $1$ minus pairwise cosine similarity among all guessed words within a round. C. Individual performance across rounds. Error bars represent standard error across participants.
  • Figure 4: Collective performance-diversity relationship. Each dot represents one hidden word. The x-axis indicates the collective performance, and the y-axis shows the collective diversity. For each condition, a linear regression line is fit to the data points. The association between performance and diversity is quantified using Pearson’s correlation coefficient.
  • Figure 5: Indirect influence of AI on human behavior and of humans on AI in the Human-AI Hybrid condition. A. Individual performance: comparison of human performance in the purely human and Hybrid (Human–AI) conditions, and AI performance in the purely AI and Hybrid conditions. B. Same for lexical diversity.
  • ...and 3 more figures