Table of Contents
Fetching ...

From Arabic Text to Puzzles: LLM-Driven Development of Arabic Educational Crosswords

Kamyar Zeinalipour, Mohamed Zaky Saad, Marco Maggini, Marco Gori

TL;DR

The work addresses the scarcity of Arabic educational tools by presenting an LLM-driven pipeline that generates crossword clues from Arabic text. It introduces the Arabic-Clue-Instruct dataset, built from Arabic Wikipedia content across multiple subjects, and leverages prompting and fine-tuning of GPT-3.5-Turbo and Llama3-8B-Instruct to produce contextually grounded clues. Evaluation combines ROUGE-based automatic metrics with human judgments, showing that fine-tuned models outperform baselines and that Llama3-8B-Instruct FT yields particularly strong clue quality. The approach offers a practical, open-source pathway for educators to generate Arabic educational crosswords and can be extended to broader subjects and languages.

Abstract

We present an Arabic crossword puzzle generator from a given text that utilizes advanced language models such as GPT-4-Turbo, GPT-3.5-Turbo and Llama3-8B-Instruct, specifically developed for educational purposes, this innovative generator leverages a meticulously compiled dataset named Arabic-Clue-Instruct with over 50,000 entries encompassing text, answers, clues, and categories. This dataset is intricately designed to aid in the generation of pertinent clues linked to specific texts and keywords within defined categories. This project addresses the scarcity of advanced educational tools tailored for the Arabic language, promoting enhanced language learning and cognitive development. By providing a culturally and linguistically relevant tool, our objective is to make learning more engaging and effective through gamification and interactivity. Integrating state-of-the-art artificial intelligence with contemporary learning methodologies, this tool can generate crossword puzzles from any given educational text, thereby facilitating an interactive and enjoyable learning experience. This tool not only advances educational paradigms but also sets a new standard in interactive and cognitive learning technologies. The model and dataset are publicly available.

From Arabic Text to Puzzles: LLM-Driven Development of Arabic Educational Crosswords

TL;DR

The work addresses the scarcity of Arabic educational tools by presenting an LLM-driven pipeline that generates crossword clues from Arabic text. It introduces the Arabic-Clue-Instruct dataset, built from Arabic Wikipedia content across multiple subjects, and leverages prompting and fine-tuning of GPT-3.5-Turbo and Llama3-8B-Instruct to produce contextually grounded clues. Evaluation combines ROUGE-based automatic metrics with human judgments, showing that fine-tuned models outperform baselines and that Llama3-8B-Instruct FT yields particularly strong clue quality. The approach offers a practical, open-source pathway for educators to generate Arabic educational crosswords and can be extended to broader subjects and languages.

Abstract

We present an Arabic crossword puzzle generator from a given text that utilizes advanced language models such as GPT-4-Turbo, GPT-3.5-Turbo and Llama3-8B-Instruct, specifically developed for educational purposes, this innovative generator leverages a meticulously compiled dataset named Arabic-Clue-Instruct with over 50,000 entries encompassing text, answers, clues, and categories. This dataset is intricately designed to aid in the generation of pertinent clues linked to specific texts and keywords within defined categories. This project addresses the scarcity of advanced educational tools tailored for the Arabic language, promoting enhanced language learning and cognitive development. By providing a culturally and linguistically relevant tool, our objective is to make learning more engaging and effective through gamification and interactivity. Integrating state-of-the-art artificial intelligence with contemporary learning methodologies, this tool can generate crossword puzzles from any given educational text, thereby facilitating an interactive and enjoyable learning experience. This tool not only advances educational paradigms but also sets a new standard in interactive and cognitive learning technologies. The model and dataset are publicly available.
Paper Structure (20 sections, 13 figures, 4 tables)

This paper contains 20 sections, 13 figures, 4 tables.

Figures (13)

  • Figure 1: The methodology employed in this study is illustrated in this figure and includes the following steps: (a) Gathering data from Arabic Wikipedia. (b) Refining and filtering the data to enhance quality by eliminating content that is either too brief or excessively detailed. (c) Developing prompts for creating educational Arabic crossword clues derived from the educational content. (d) Employing GPT4-Turbo to generate Arabic crossword clues using the refined data and specifically crafted prompts. (e) Fine-tuning Large Language Models (LLMs) to more effectively produce Arabic clues tailored to the given context.
  • Figure 2: Prompt used in the study.
  • Figure 3: Word and Character Length Distributions for Contexts, Outputs, and Keywords.
  • Figure 4: Bar Plot Showing the Frequency of Twenty Categories within the Dataset.
  • Figure 5: Bar Plot Showing the Frequency of GPT4 Ratings
  • ...and 8 more figures