Table of Contents
Fetching ...

SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings

Ting-Yao Hsu, Chieh-Yang Huang, Shih-Hong Huang, Ryan Rossi, Sungchul Kim, Tong Yu, C. Lee Giles, Ting-Hao K. Huang

TL;DR

SciCapenter integrates AI-generated figure captions, multi-aspect quality ratings, and an editable workflow to support scientists in composing captions for scholarly figures. It automatically extracts figures and figure-mentioning paragraphs, generates long and short captions, and produces ratings with explanations that users can edit and resubmit. A user study with STEM Ph.D. students demonstrates reduced cognitive load during caption writing and yields design insights for future caption-writing tools. The work highlights the value of contextualized, actionable writing support for figures and has potential for broader deployment in academic writing environments.

Abstract

Crafting effective captions for figures is important. Readers heavily depend on these captions to grasp the figure's message. However, despite a well-developed set of AI technologies for figures and captions, these have rarely been tested for usefulness in aiding caption writing. This paper introduces SciCapenter, an interactive system that puts together cutting-edge AI technologies for scientific figure captions to aid caption composition. SciCapenter generates a variety of captions for each figure in a scholarly article, providing scores and a comprehensive checklist to assess caption quality across multiple critical aspects, such as helpfulness, OCR mention, key takeaways, and visual properties reference. Users can directly edit captions in SciCapenter, resubmit for revised evaluations, and iteratively refine them. A user study with Ph.D. students indicates that SciCapenter significantly lowers the cognitive load of caption writing. Participants' feedback further offers valuable design insights for future systems aiming to enhance caption writing.

SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings

TL;DR

SciCapenter integrates AI-generated figure captions, multi-aspect quality ratings, and an editable workflow to support scientists in composing captions for scholarly figures. It automatically extracts figures and figure-mentioning paragraphs, generates long and short captions, and produces ratings with explanations that users can edit and resubmit. A user study with STEM Ph.D. students demonstrates reduced cognitive load during caption writing and yields design insights for future caption-writing tools. The work highlights the value of contextualized, actionable writing support for figures and has potential for broader deployment in academic writing environments.

Abstract

Crafting effective captions for figures is important. Readers heavily depend on these captions to grasp the figure's message. However, despite a well-developed set of AI technologies for figures and captions, these have rarely been tested for usefulness in aiding caption writing. This paper introduces SciCapenter, an interactive system that puts together cutting-edge AI technologies for scientific figure captions to aid caption composition. SciCapenter generates a variety of captions for each figure in a scholarly article, providing scores and a comprehensive checklist to assess caption quality across multiple critical aspects, such as helpfulness, OCR mention, key takeaways, and visual properties reference. Users can directly edit captions in SciCapenter, resubmit for revised evaluations, and iteratively refine them. A user study with Ph.D. students indicates that SciCapenter significantly lowers the cognitive load of caption writing. Participants' feedback further offers valuable design insights for future systems aiming to enhance caption writing.
Paper Structure (19 sections, 4 figures, 1 table)

This paper contains 19 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of SciCapenter system interface. PDF Upload Panel (A): A drag-and-drop interface for uploading PDF files. Navigation Bar (B): A horizontal bar showing a list of figures extracted from the uploaded document. Figure Image (C): The main area displaying the image of the selected figure. Caption Editor (D): A text box for editing the caption of the selected figure. Caption Rating (F): A feedback system that allows GPT to rate the quality of the caption, represented by a star rating. Caption Analysis (Check Table) (E): Icons indicating the presence or absence of key elements in the caption, such as helpfulness or takeaway message. Explanation for the Rating (G): A textual explanation providing insight into why a particular star rating was given to the caption. Machine-generated Captions & Their Ratings (H): This section includes long and short captions generated by AI models, each accompanied by their respective star ratings. Figure-mentioning Paragraphs (I): Paragraphs in the document that mention the target figure, providing context or additional information.
  • Figure 2: Before the study, participants provided ten papers from their research domain, either intended for reading or briefly skimmed but not read in-depth. We processed these papers through SciCapenter, choosing six target figures and manually redacting their captions. Participants received the redacted PDFs in the user study and were asked to write captions for the figures.
  • Figure 3: Comparison between six different elements provided by SciCapenter. Left figure shows the mean rating and standard deviation of a five-point scale for different elements. The check table and referred paragraph were rated highest, while short and long caption had the lowest score, indicating they were the least favored elements according to the participants. Right figure shows a breakdown of the five-point scale with different colors representing each rating. The short caption exhibit a more varied distribution of opinions.
  • Figure 4: Comparative evaluation of caption quality by three experts, where each caption type—Ground Truth, Summary Short, Summary Long, and GPT-4V—is rated on a scale from rank 1 (highest) to rank 4 (lowest). Both Expert 1 and Expert 2 rated Ground Truth caption as rank 1 most frequently, while Expert 3 had a preference for Summary Short. Notably, Expert 3 rated GPT-4V the lowest, rarely giving it a rank 1, whereas both Expert 1 and Expert 2 often considered GPT-4V as their second choice for rank 1. The variations in evaluations reflect differing perspectives on caption quality and suggest that while Ground Truth captions are generally preferred, there's a significant disparity in how each expert rates the machine-generated captions.