A Large-Scale Dataset and Citation Intent Classification in Turkish with LLMs

Kemal Sami Karaca; Bahaeddin Eravcı

A Large-Scale Dataset and Citation Intent Classification in Turkish with LLMs

Kemal Sami Karaca, Bahaeddin Eravcı

TL;DR

Turkish citation-intent analysis has lacked large-scale resources and robust methodologies. The paper introduces the first public Turkish CIC dataset with 2,650 labeled citation sentences from Computer Science articles and a five-class WoS taxonomy, built via a hybrid human-AI annotation process and precise extraction (CEX+GROBID). It benchmarks In-Context Learning with manual prompts, then leverages DSPy for automated prompt optimization and a stacked ensemble (with an XGBoost meta-learner) to achieve 91.3% accuracy, demonstrating improved robustness over individual models. The work delivers a foundational resource and a reproducible framework for qualitative scholarly analysis in Turkish, with implications for broader non-English CIC research and downstream knowledge-graph applications.

Abstract

Understanding the qualitative intent of citations is essential for a comprehensive assessment of academic research, a task that poses unique challenges for agglutinative languages like Turkish. This paper introduces a systematic methodology and a foundational dataset to address this problem. We first present a new, publicly available dataset of Turkish citation intents, created with a purpose-built annotation tool. We then evaluate the performance of standard In-Context Learning (ICL) with Large Language Models (LLMs), demonstrating that its effectiveness is limited by inconsistent results caused by manually designed prompts. To address this core limitation, we introduce a programmable classification pipeline built on the DSPy framework, which automates prompt optimization systematically. For final classification, we employ a stacked generalization ensemble to aggregate outputs from multiple optimized models, ensuring stable and reliable predictions. This ensemble, with an XGBoost meta-model, achieves a state-of-the-art accuracy of 91.3\%. Ultimately, this study provides the Turkish NLP community and the broader academic circles with a foundational dataset and a robust classification framework paving the way for future qualitative citation studies.

A Large-Scale Dataset and Citation Intent Classification in Turkish with LLMs

TL;DR

Abstract

A Large-Scale Dataset and Citation Intent Classification in Turkish with LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)