Leveraging Small LLMs for Argument Mining in Education: Argument Component Identification, Classification, and Assessment

Lucile Favero; Juan Antonio Pérez-Ortiz; Tanja Käser; Nuria Oliver

Leveraging Small LLMs for Argument Mining in Education: Argument Component Identification, Classification, and Assessment

Lucile Favero, Juan Antonio Pérez-Ortiz, Tanja Käser, Nuria Oliver

TL;DR

The paper investigates using small open-source LLMs to perform a full argument mining pipeline (segmentation, argument type classification, and argument quality assessment) on student essays in education, with a focus on local, privacy-preserving deployment. It compares few-shot prompting and fine-tuning across three models (Qwen 2.5 7B, Llama 3.1 8B, Gemma 2 9B) against encoder baselines and GPT-4o mini on the Feedback Prize dataset. Results show that fine-tuned small LLMs outperform state-of-the-art encoders in segmentation and type classification, while few-shot prompting yields competitive results for quality assessment; joint modeling yields additional gains. The work demonstrates the practicality and privacy-preserving potential of open-source LLMs for real-time, personalized feedback on student writing, enabling scalable education tools on local devices.

Abstract

Argument mining algorithms analyze the argumentative structure of essays, making them a valuable tool for enhancing education by providing targeted feedback on the students' argumentation skills. While current methods often use encoder or encoder-decoder deep learning architectures, decoder-only models remain largely unexplored, offering a promising research direction. This paper proposes leveraging open-source, small Large Language Models (LLMs) for argument mining through few-shot prompting and fine-tuning. These models' small size and open-source nature ensure accessibility, privacy, and computational efficiency, enabling schools and educators to adopt and deploy them locally. Specifically, we perform three tasks: segmentation of student essays into arguments, classification of the arguments by type, and assessment of their quality. We empirically evaluate the models on the Feedback Prize - Predicting Effective Arguments dataset of grade 6-12 students essays and demonstrate how fine-tuned small LLMs outperform baseline methods in segmenting the essays and determining the argument types while few-shot prompting yields comparable performance to that of the baselines in assessing quality. This work highlights the educational potential of small, open-source LLMs to provide real-time, personalized feedback, enhancing independent learning and writing skills while ensuring low computational cost and privacy.

Leveraging Small LLMs for Argument Mining in Education: Argument Component Identification, Classification, and Assessment

TL;DR

Abstract

Leveraging Small LLMs for Argument Mining in Education: Argument Component Identification, Classification, and Assessment

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)