Beyond Search Engines: Can Large Language Models Improve Curriculum Development?
Mohammad Moein, Mohammadreza Molavi Hajiagha, Abdolali Faraji, Mohammadreza Tavakoli, Gàbor Kismihòk
TL;DR
The paper tackles the problem of keeping online curricula current by proposing a framework that uses large language models to generate relevant learning topics for courses, using course titles as input and evaluating against YouTube-derived baselines. It builds a dataset from YouTube playlists across 25 learning areas, generates topic lists with GPT-4 and GPT-3.5 (with multiple samples), and assesses alignment with ground-truth topics using $F_1$-based BERTScore. Key findings show GPT-4 achieving an $F_1$ of around $0.30$, slightly better than the YouTube baseline, while GPT-3.5 underperforms, indicating potential for LLM-assisted curriculum design and highlighting recall limitations. The work contributes a reproducible dataset and an evaluation framework to advance AI-assisted curriculum development, while acknowledging limitations from relying on YouTube as ground truth and suggesting broader data sources for future work.
Abstract
While Online Learning is growing and becoming widespread, the associated curricula often suffer from a lack of coverage and outdated content. In this regard, a key question is how to dynamically define the topics that must be covered to thoroughly learn a subject (e.g., a course). Large Language Models (LLMs) are considered candidates that can be used to address curriculum development challenges. Therefore, we developed a framework and a novel dataset, built on YouTube, to evaluate LLMs' performance when it comes to generating learning topics for specific courses. The experiment was conducted across over 100 courses and nearly 7,000 YouTube playlists in various subject areas. Our results indicate that GPT-4 can produce more accurate topics for the given courses than extracted topics from YouTube video playlists in terms of BERTScore
