UM_FHS at the CLEF 2025 SimpleText Track: Comparing No-Context and Fine-Tune Approaches for GPT-4.1 Models in Sentence and Document-Level Text Simplification
Primoz Kocbek, Gregor Stiglic
TL;DR
The paper investigates scientific text simplification at sentence and document levels by comparing no-context prompting and fine-tuning across GPT-4.1 variants on Cochrane-derived data. It uses SARI, BLEU, FKGL, and compression to evaluate performance, revealing that no-context GPT-4.1-mini provides the strongest and most consistent results, while fine-tuned smaller models offer limited gains and occasional strengths in document-level tasks. Cost considerations favor smaller fine-tuned models, but overall gains are modest and highly dependent on task granularity and prompting. The work highlights critical roles for model selection and prompt design in biomedical text simplification and points to future work on prompt strategy refinement and broader human evaluation.
Abstract
This work describes our submission to the CLEF 2025 SimpleText track Task 1, addressing both sentenceand document-level simplification of scientific texts. The methodology centered on using the gpt-4.1, gpt-4.1mini, and gpt-4.1-nano models from OpenAI. Two distinct approaches were compared: a no-context method relying on prompt engineering and a fine-tuned (FT) method across models. The gpt-4.1-mini model with no-context demonstrated robust performance at both levels of simplification, while the fine-tuned models showed mixed results, highlighting the complexities of simplifying text at different granularities, where gpt-4.1-nano-ft performance stands out at document-level simplification in one case.
