Personalized Scientific Figure Caption Generation: An Empirical Study on Author-Specific Writing Style Transfer
Jaeyoung Kim, Jongho Lee, Hongjun Choi, Sion Jang
TL;DR
The paper tackles personalized figure caption generation by leveraging author-specific profile data from the same paper, addressing the trade-off between stylistic mimicry and caption informativeness. It introduces a two-stage pipeline: a caption-quality evaluator $f_{quality}$ filters training data and a multimodal caption generator $g_{caption}$ is fine-tuned with author profiles $(F, P, M, O)$ and related figures. Empirical results show that richer profile context yields higher BLEU/ROUGE scores and that there is a measurable tension between personalization and quality, motivating a quality-aware training paradigm that jointly predicts caption quality and enables inference-time control via Predicted-Q versus Forced-Q6. The work demonstrates competitive performance relative to larger models and highlights practical considerations for deploying caption automation systems that preserve author voice while maintaining high-quality scientific communication.
Abstract
We study personalized figure caption generation using author profile data from scientific papers. Our experiments demonstrate that rich author profile data, combined with relevant metadata, can significantly improve the personalization performance of multimodal large language models. However, we also reveal a fundamental trade-off between matching author style and maintaining caption quality. Our findings offer valuable insights and future directions for developing practical caption automation systems that balance both objectives. This work was conducted as part of the 3rd SciCap challenge.
