SciDef: Automating Definition Extraction from Academic Literature with Large Language Models
Filip Kučera, Christoph Mandl, Isao Echizen, Radu Timofte, Timo Spinde
TL;DR
SciDef presents an LLM-driven pipeline for automated extraction of definitions from scientific literature and introduces two benchmarks, DefExtra and DefSim, to evaluate extraction quality and definitional similarity. Through extensive experiments over 16 models and multiple prompting strategies, the authors show that multi-step prompting and DSPy-optimized prompts yield higher extraction quality, with NLI-based similarity providing the most reliable evaluation. The work reports that LLMs can recover a large majority of ground-truth definitions (around 86%), yet over-generation and relevance filtering remain key challenges for real-world deployment. By releasing DefExtra, DefSim, and SciDef, the study lays groundwork for scalable definitional taxonomy construction, while acknowledging cost and domain limitations that motivate further research.
Abstract
Definitions are the foundation for any scientific work, but with a significant increase in publication numbers, gathering definitions relevant to any keyword has become challenging. We therefore introduce SciDef, an LLM-based pipeline for automated definition extraction. We test SciDef on DefExtra & DefSim, novel datasets of human-extracted definitions and definition-pairs' similarity, respectively. Evaluating 16 language models across prompting strategies, we demonstrate that multi-step and DSPy-optimized prompting improve extraction performance. To evaluate extraction, we test various metrics and show that an NLI-based method yields the most reliable results. We show that LLMs are largely able to extract definitions from scientific literature (86.4% of definitions from our test-set); yet future work should focus not just on finding definitions, but on identifying relevant ones, as models tend to over-generate them. Code & datasets are available at https://github.com/Media-Bias-Group/SciDef.
