Table of Contents
Fetching ...

Leveraging Author-Specific Context for Scientific Figure Caption Generation: 3rd SciCap Challenge

Watcharapong Timklaypachara, Monrada Chiewhawan, Nopporn Lekuthai, Titipat Achakulvisut

TL;DR

The paper tackles automatic scientific figure caption generation by merging context-grounded reasoning with author-specific stylistic adaptation. It introduces a two-stage framework: Stage 1 builds content-grounded captions using category-focused prompts optimized by DSPy MIPROv2 and SIMBA, plus a caption-candidate selection step; Stage 2 applies profile-informed few-shot prompting to align captions with the source paper's writing style. Experiments on the LaMP-Cap dataset show category-focused prompts improve recall with controlled precision loss ($+8.3\%$ ROUGE-1 recall, $-2.8\%$ precision, $-10.9\%$ BLEU-4), while profile-driven refinement yields substantial BLEU ($40$–$48\%$) and ROUGE precision gains ($25$–$27\%$) at modest recall cost. Overall, combining contextual understanding with author-specific stylistic adaptation yields captions that are both scientifically accurate and stylistically faithful to the source, with practical implications for scalable, consistent scientific communication.

Abstract

Scientific figure captions require both accuracy and stylistic consistency to convey visual information. Here, we present a domain-specific caption generation system for the 3rd SciCap Challenge that integrates figure-related textual context with author-specific writing styles using the LaMP-Cap dataset. Our approach uses a two-stage pipeline: Stage 1 combines context filtering, category-specific prompt optimization via DSPy's MIPROv2 and SIMBA, and caption candidate selection; Stage 2 applies few-shot prompting with profile figures for stylistic refinement. Our experiments demonstrate that category-specific prompts outperform both zero-shot and general optimized approaches, improving ROUGE-1 recall by +8.3\% while limiting precision loss to -2.8\% and BLEU-4 reduction to -10.9\%. Profile-informed stylistic refinement yields 40--48\% gains in BLEU scores and 25--27\% in ROUGE. Overall, our system demonstrates that combining contextual understanding with author-specific stylistic adaptation can generate captions that are both scientifically accurate and stylistically faithful to the source paper.

Leveraging Author-Specific Context for Scientific Figure Caption Generation: 3rd SciCap Challenge

TL;DR

The paper tackles automatic scientific figure caption generation by merging context-grounded reasoning with author-specific stylistic adaptation. It introduces a two-stage framework: Stage 1 builds content-grounded captions using category-focused prompts optimized by DSPy MIPROv2 and SIMBA, plus a caption-candidate selection step; Stage 2 applies profile-informed few-shot prompting to align captions with the source paper's writing style. Experiments on the LaMP-Cap dataset show category-focused prompts improve recall with controlled precision loss ( ROUGE-1 recall, precision, BLEU-4), while profile-driven refinement yields substantial BLEU () and ROUGE precision gains () at modest recall cost. Overall, combining contextual understanding with author-specific stylistic adaptation yields captions that are both scientifically accurate and stylistically faithful to the source, with practical implications for scalable, consistent scientific communication.

Abstract

Scientific figure captions require both accuracy and stylistic consistency to convey visual information. Here, we present a domain-specific caption generation system for the 3rd SciCap Challenge that integrates figure-related textual context with author-specific writing styles using the LaMP-Cap dataset. Our approach uses a two-stage pipeline: Stage 1 combines context filtering, category-specific prompt optimization via DSPy's MIPROv2 and SIMBA, and caption candidate selection; Stage 2 applies few-shot prompting with profile figures for stylistic refinement. Our experiments demonstrate that category-specific prompts outperform both zero-shot and general optimized approaches, improving ROUGE-1 recall by +8.3\% while limiting precision loss to -2.8\% and BLEU-4 reduction to -10.9\%. Profile-informed stylistic refinement yields 40--48\% gains in BLEU scores and 25--27\% in ROUGE. Overall, our system demonstrates that combining contextual understanding with author-specific stylistic adaptation can generate captions that are both scientifically accurate and stylistically faithful to the source paper.

Paper Structure

This paper contains 17 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Overview of our multi-stage caption generation framework