Developing a Tutoring Dialog Dataset to Optimize LLMs for Educational Use

Menna Fateen; Tsunenori Mine

Developing a Tutoring Dialog Dataset to Optimize LLMs for Educational Use

Menna Fateen, Tsunenori Mine

TL;DR

This study developed a synthetic tutoring dialog dataset, evaluated by human teachers, and fine-tuned a smaller LLM using this dataset, demonstrating a viable, cost-effective approach for implementing LLM-based tutoring systems in educational settings.

Abstract

Recent advances in large language models (LLMs) have shown promise for scalable educational applications, but their use in dialog-based tutoring systems remains challenging due to the need for effective pedagogical strategies and the high costs associated with expert-curated datasets. Our study explores the use of smaller, more affordable LLMs for one-on-one tutoring in the context of solving reading comprehension problems. We developed a synthetic tutoring dialog dataset, evaluated by human teachers, and fine-tuned a smaller LLM using this dataset. Furthermore, we conducted an interactive experiment comparing the performance of the fine-tuned model with a larger model in real-world tutoring scenarios. Our results show that the fine-tuned model performs on par with the larger model but at a lower cost, demonstrating a viable, cost-effective approach for implementing LLM-based tutoring systems in educational settings.

Developing a Tutoring Dialog Dataset to Optimize LLMs for Educational Use

TL;DR

Abstract

Developing a Tutoring Dialog Dataset to Optimize LLMs for Educational Use

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)