Table of Contents
Fetching ...

Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creation with Generative Models

Chao Wen, Tung Phung, Pronita Mehrotra, Sumit Gulwani, Roger E. Beaty, Tomohiro Nagashima, Adish Singla

Abstract

Generative AI has democratized content creation, but popular chatbot-based interfaces often prioritize execution, generating fully rendered artifacts right away. This issue can lead to premature convergence and design fixation, where users are being anchored to initial outputs. Recent works have proposed new interfaces to address this issue by supporting exploration, though typically constrained to be semantically close to a user's initial task framing, potentially limiting the creativity of the outcomes. We examine an approach grounded in the Geneplore model of creative cognition and instantiate it in a human-AI co-creation system, HAICo, for creative image generation. HAICo explicitly structures the creative process into two switchable modes: DIVERGENT mode scaffolds the broad exploration of remote conceptual ideas; CONVERGENT mode supports a targeted refinement of selected ideas. Through a within-subjects study (N=24) on a poster image creation task, we demonstrate that HAICo outperforms ChatGPT across multiple dimensions of creativity and usability. Our results highlight the critical need to shift from pure execution-focused chatbots to scaffolded co-creation systems that actively guide exploration and foster the creative process.

Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creation with Generative Models

Abstract

Generative AI has democratized content creation, but popular chatbot-based interfaces often prioritize execution, generating fully rendered artifacts right away. This issue can lead to premature convergence and design fixation, where users are being anchored to initial outputs. Recent works have proposed new interfaces to address this issue by supporting exploration, though typically constrained to be semantically close to a user's initial task framing, potentially limiting the creativity of the outcomes. We examine an approach grounded in the Geneplore model of creative cognition and instantiate it in a human-AI co-creation system, HAICo, for creative image generation. HAICo explicitly structures the creative process into two switchable modes: DIVERGENT mode scaffolds the broad exploration of remote conceptual ideas; CONVERGENT mode supports a targeted refinement of selected ideas. Through a within-subjects study (N=24) on a poster image creation task, we demonstrate that HAICo outperforms ChatGPT across multiple dimensions of creativity and usability. Our results highlight the critical need to shift from pure execution-focused chatbots to scaffolded co-creation systems that actively guide exploration and foster the creative process.

Paper Structure

This paper contains 57 sections, 15 figures, 5 tables.

Figures (15)

  • Figure 1: generalization of Fig. \ref{['fig.teaser']}, exemplifying a user trajectory when using our two-mode approach to solve a task. denote an idea, and denote an artifact.
  • Figure 2: HAICo's interfaces. In the Divergent mode, the user starts by entering a prompt, after which HAICogenerates a set of Idea Cards with a title, thumbnail, description, background (shown on hover of the question mark), and category tags. The user can continue broadening the idea space by clicking "More Ideas" or manually creating their own idea card, for example, by creating "Yosemite Tunnel View" as a new idea inspired by "The Japanese Forest Bathing Path." Upon selecting an idea, the user can edit it with the pencil icon or generate an image from that idea with the spark icon. To refine an image, clicking the "Refine" button (Fig. \ref{['fig.system.interface']}F) opens a new refinement tab; multiple tabs can be opened in parallel. In the Convergent mode, the user can submit a refinement prompt, after which the system generates parameters (whose meanings are shown on hover of the question mark) and options that the user can select (options are also editable); the user can then generate a new image variation. The Image Library shows the initial image alongside its refined variations, and any image can be further refined via the icon in its bottom-right corner. The trajectory in Fig. \ref{['fig.teaser']} maps to these UI operations as follows: $\rightarrow$, $\rightarrow$, $\rightarrow$, $\rightarrow$, $\rightarrow$, $\rightarrow$, and $\rightarrow$.
  • Figure 3: User Study Procedure (figure adapted from 10.1145/3706598.3713375).
  • Figure 4: Results for RQ1. (a) Creativity Support Index (CSI) scores across five dimensions: HAICo scored significantly higher than ChatGPT on all dimensions (all $W < 30.0$, all $p < 0.002$). (b) System usability (UMUX-Lite): HAICo scored significantly higher than ChatGPT overall ($M = 81.25$ vs. $64.24$; $W = 17.0$, $p < 0.001$). (c) Final downloaded image quality across four dimensions: HAICo produced significantly more novel images ($M = 3.22$ vs. $2.41$; $W = 0.0$, $p < 0.001$) and more diverse image sets ($M = 0.48$ vs. $0.36$; $W = 26.0$, $p = 0.001$), while fluency and usefulness were not significantly different across systems.
  • Figure 5: Example final posters created during the study for the task "Spend Less Time on Phones." Posters (a) and (b) were created with ChatGPT, and (c) and (d) were created with HAICo. For each system, we show the highest overall-scoring poster and a median overall-scoring poster, where the overall score is the sum of the novelty and usefulness scores. The (novelty, usefulness) scores for posters (a)-(d) are $(2.4, 3.8)$, $(2.0, 3.8)$, $(4.8, 3.8)$, $(3.2, 3.4)$, respectively. Poster (c) achieved a high novelty score (4.8/5) by first capturing attention with "FREE FOOD!!!," then guiding viewers across the layout from the left corner to the right corner, where the key message appears: "Take a break from scrolling! Look around, feel alive." As viewers follow these cues, their attention shifts away from the central phone, subtly mirroring the act of spending less time on it. This staged redirection makes the message both novel and memorable.
  • ...and 10 more figures