Automating Turkish Educational Quiz Generation Using Large Language Models

Kamyar Zeinalipour; Yusuf Gökberk Keptiğ; Marco Maggini; Marco Gori

Automating Turkish Educational Quiz Generation Using Large Language Models

Kamyar Zeinalipour, Yusuf Gökberk Keptiğ, Marco Maggini, Marco Gori

TL;DR

This research leverages the capabilities of Large Language Models, including GPT-4-Turbo, GPT-3.5-Turbo, Llama-2-7b-chat-hf, and Llama-2-13b-chat-hf, to automatically generate quiz questions and answers from the Turkish educational content, thereby opening new avenues for automated Turkish quiz generation.

Abstract

Crafting quizzes from educational content is a pivotal activity that benefits both teachers and students by reinforcing learning and evaluating understanding. In this study, we introduce a novel approach to generate quizzes from Turkish educational texts, marking a pioneering endeavor in educational technology specifically tailored to the Turkish educational context. We present a specialized dataset, named the Turkish-Quiz-Instruct, comprising an extensive collection of Turkish educational texts accompanied by multiple-choice and short-answer quizzes. This research leverages the capabilities of Large Language Models (LLMs), including GPT-4-Turbo, GPT-3.5-Turbo, Llama-2-7b-chat-hf, and Llama-2-13b-chat-hf, to automatically generate quiz questions and answers from the Turkish educational content. Our work delineates the methodology for employing these LLMs in the context of Turkish educational material, thereby opening new avenues for automated Turkish quiz generation. The study not only demonstrates the efficacy of using such models for generating coherent and relevant quiz content but also sets a precedent for future research in the domain of automated educational content creation for languages other than English. The Turkish-Quiz-Instruct dataset is introduced as a valuable resource for researchers and practitioners aiming to explore the boundaries of educational technology and language-specific applications of LLMs in Turkish. By addressing the challenges of quiz generation in a non-English context specifically Turkish, this study contributes significantly to the field of Turkish educational technology, providing insights into the potential of leveraging LLMs for educational purposes across diverse linguistic landscapes.

Automating Turkish Educational Quiz Generation Using Large Language Models

TL;DR

Abstract

Paper Structure (15 sections, 7 figures, 2 tables)

This paper contains 15 sections, 7 figures, 2 tables.

Introduction
Related Work
Methodology
Turkish-Quiz-Instruct
Data Scraping
Data Cleaning and Filtering
Craft the prompt.
Generating Educational multiple-answer questions.
Evaluating Generated Data Quality
From LLMs to Turkish Educational Quizzes
Experiments
Fine-Tuning Configuration
Turkish educational multiple-choice questions generation
Turkish short-answer questions generation
Conclusion

Figures (7)

Figure 1: The diagram presents the methodology used in this study as follows: (a) Data collection involving scraping educational content in Turkish, covering various subjects such as biology, history, etc. (b) Data refinement and filtering to improve quality by removing overly short or excessively detailed content. (c) Create prompts to generate Turkish quizzes based on the educational content. (d) Utilization of GPT-4-Turbo to produce quizzes from the collected data and configured prompts. (e) Fine-tuning Large Language Models (LLMs) to generate Turkish educational quizzes from the given educational context.
Figure 2: Turkish Quiz Generation Prompt.
Figure 3: Token Distribution of Turkish educational context and generated Turkish educational multiple-choice questions using GPT-4-Turbo
Figure 4: (a) Content Subject Distribution and (b) GPT-4 Ratings from the Human Evaluation
Figure 5: Sample Questions Created by Fine-Tuned Language Models.
...and 2 more figures

Automating Turkish Educational Quiz Generation Using Large Language Models

TL;DR

Abstract

Automating Turkish Educational Quiz Generation Using Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)