Table of Contents
Fetching ...

Understanding How Paper Writers Use AI-Generated Captions in Figure Caption Writing

Ho Yin, Ng, Ting-Yao Hsu, Jiyoo Min, Sungchul Kim, Ryan A. Rossi, Tong Yu, Hyunggu Jung, Ting-Hao 'Kenneth' Huang

TL;DR

This study investigates how paper authors use AI-generated captions when writing captions for their own figures, addressing a gap in the literature that has largely focused on reader-centered caption evaluation. Using a user study with 18 researchers who rewritten captions for two figures from their own recent work, the authors provided three AI-generated captions per figure and analyzed the caption-writing process via video interaction analysis. Key findings show that writers often copy AI captions and then refine them, favor longer, detail-rich captions that blend textual and visual cues, and find AI less helpful for complex conceptual figures; about one-third of tasks involve multiple AI captions, and initial interactions typically reference full sentences. The results underscore the need for sophisticated, interactive AI writing assistants that support writers’ cognitive workflows across different figure types, informing design directions for better human-in-the-loop captioning tools and more effective scholarly communication.

Abstract

Figures and their captions play a key role in scientific publications. However, despite their importance, many captions in published papers are poorly crafted, largely due to a lack of attention by paper authors. While prior AI research has explored caption generation, it has mainly focused on reader-centered use cases, where users evaluate generated captions rather than actively integrating them into their writing. This paper addresses this gap by investigating how paper authors incorporate AI-generated captions into their writing process through a user study involving 18 participants. Each participant rewrote captions for two figures from their own recently published work, using captions generated by state-of-the-art AI models as a resource. By analyzing video recordings of the writing process through interaction analysis, we observed that participants often began by copying and refining AI-generated captions. Paper writers favored longer, detail-rich captions that integrated textual and visual elements but found current AI models less effective for complex figures. These findings highlight the nuanced and diverse nature of figure caption composition, revealing design opportunities for AI systems to better support the challenges of academic writing.

Understanding How Paper Writers Use AI-Generated Captions in Figure Caption Writing

TL;DR

This study investigates how paper authors use AI-generated captions when writing captions for their own figures, addressing a gap in the literature that has largely focused on reader-centered caption evaluation. Using a user study with 18 researchers who rewritten captions for two figures from their own recent work, the authors provided three AI-generated captions per figure and analyzed the caption-writing process via video interaction analysis. Key findings show that writers often copy AI captions and then refine them, favor longer, detail-rich captions that blend textual and visual cues, and find AI less helpful for complex conceptual figures; about one-third of tasks involve multiple AI captions, and initial interactions typically reference full sentences. The results underscore the need for sophisticated, interactive AI writing assistants that support writers’ cognitive workflows across different figure types, informing design directions for better human-in-the-loop captioning tools and more effective scholarly communication.

Abstract

Figures and their captions play a key role in scientific publications. However, despite their importance, many captions in published papers are poorly crafted, largely due to a lack of attention by paper authors. While prior AI research has explored caption generation, it has mainly focused on reader-centered use cases, where users evaluate generated captions rather than actively integrating them into their writing. This paper addresses this gap by investigating how paper authors incorporate AI-generated captions into their writing process through a user study involving 18 participants. Each participant rewrote captions for two figures from their own recently published work, using captions generated by state-of-the-art AI models as a resource. By analyzing video recordings of the writing process through interaction analysis, we observed that participants often began by copying and refining AI-generated captions. Paper writers favored longer, detail-rich captions that integrated textual and visual elements but found current AI models less effective for complex figures. These findings highlight the nuanced and diverse nature of figure caption composition, revealing design opportunities for AI systems to better support the challenges of academic writing.
Paper Structure (30 sections, 10 figures, 1 table)

This paper contains 30 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: Overview of the user study procedure, which included interviews and two writing tasks. In each task, participants rewrote figure captions from their recently published papers. Participants were provided with the original paper's PDF (with the original caption redacted) and three AI-generated captions to assist in the rewriting process.
  • Figure 2: Representative examples of statistical and conceptual figures, with their sources detailed in the Appendix.
  • Figure 3: User interface for the figure caption writing task, showing: (1) 'Target Paper' - the hyperlink to the redacted PDF, (2) 'Target Figure' - displaying the figure image for the writing task, with the figure number and the page number in the redacted PDF, (3) 'Your Captions' - User input area for caption writing, and (4) 'Suggested Captions' - AI-generated captions from 3 different configurations (Unlimited, Text-Only, 30-Word), presented in randomized order for each caption item.
  • Figure 4: Average number of interactions with AI-generated captions per writing task. Participants interacted with AI-generated captions an average of 2.13 times per session for statistical figures (SD=1.32) and 2.00 times for conceptual figures (SD=1.65).
  • Figure 5: Distribution of the four AI integration activity types for statistical and conceptual figures.
  • ...and 5 more figures