Table of Contents
Fetching ...

A Turkish Educational Crossword Puzzle Generator

Kamyar Zeinalipour, Yusuf Gökberk Keptiğ, Marco Maggini, Leonardo Rigutini, Marco Gori

TL;DR

The paper addresses the lack of Turkish-language educational crossword tools and demonstrates how large language models can autogenerate clues and layouts. It introduces two datasets—TAC for Turkish answer-clue pairs and T4TAC for text-based clue generation with category labels—plus a full crossword-generation system that supports keyword- and text-driven inputs. Fine-tuning GPT-3.5-Turbo and Llama-2 models on these datasets, combined with a schema-driven layout algorithm and a scoring rule $Score = (FW + 0.5 \\cdot LL) \\times FR \\times LR$, yields clues and puzzles of educational quality, with human evaluators confirming meaningful performance. The work contributes open datasets and an accessible tool for Turkish education, and future work aims to extend to more languages and more advanced clue-generation capabilities.

Abstract

This paper introduces the first Turkish crossword puzzle generator designed to leverage the capabilities of large language models (LLMs) for educational purposes. In this work, we introduced two specially created datasets: one with over 180,000 unique answer-clue pairs for generating relevant clues from the given answer, and another with over 35,000 samples containing text, answer, category, and clue data, aimed at producing clues for specific texts and keywords within certain categories. Beyond entertainment, this generator emerges as an interactive educational tool that enhances memory, vocabulary, and problem-solving skills. It's a notable step in AI-enhanced education, merging game-like engagement with learning for Turkish and setting new standards for interactive, intelligent learning tools in Turkish.

A Turkish Educational Crossword Puzzle Generator

TL;DR

The paper addresses the lack of Turkish-language educational crossword tools and demonstrates how large language models can autogenerate clues and layouts. It introduces two datasets—TAC for Turkish answer-clue pairs and T4TAC for text-based clue generation with category labels—plus a full crossword-generation system that supports keyword- and text-driven inputs. Fine-tuning GPT-3.5-Turbo and Llama-2 models on these datasets, combined with a schema-driven layout algorithm and a scoring rule , yields clues and puzzles of educational quality, with human evaluators confirming meaningful performance. The work contributes open datasets and an accessible tool for Turkish education, and future work aims to extend to more languages and more advanced clue-generation capabilities.

Abstract

This paper introduces the first Turkish crossword puzzle generator designed to leverage the capabilities of large language models (LLMs) for educational purposes. In this work, we introduced two specially created datasets: one with over 180,000 unique answer-clue pairs for generating relevant clues from the given answer, and another with over 35,000 samples containing text, answer, category, and clue data, aimed at producing clues for specific texts and keywords within certain categories. Beyond entertainment, this generator emerges as an interactive educational tool that enhances memory, vocabulary, and problem-solving skills. It's a notable step in AI-enhanced education, merging game-like engagement with learning for Turkish and setting new standards for interactive, intelligent learning tools in Turkish.
Paper Structure (12 sections, 7 figures, 1 table)

This paper contains 12 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: The dataset entries are showcased visually through the distribution of answer lengths. Blue bars represent all answer-clue pairs, green bars show the frequency of unique answers, and red bars display the frequency of unique answer-clue pairs
  • Figure 2: Diagram of the steps followed in the construction of the T4TAC dataset.
  • Figure 3: The prompt utilized in the study.
  • Figure 4: (i) Word and Character Length Distributions for Contexts, Outputs, and Keywords.(ii) Category Distributions of T4TAC. (iii) Human Evaluation for T4TAC.
  • Figure 5: The scheme of the crossword generation system
  • ...and 2 more figures