YouLeQD: Decoding the Cognitive Complexity of Questions and Engagement in Online Educational Videos from Learners' Perspectives
Nong Ming, Sachin Sharma, Jiho Noh
TL;DR
The paper tackles understanding cognitive complexity in learner questions from online educational videos by constructing YouLeQD, a large-scale dataset of 57,242 questions drawn from YouTube comments across five STEM subjects. It combines RoBERTa-based question detection with Bloom's Taxonomy classification, leveraging LLM-powered data augmentation (GPT-4o) and knowledge distillation to train robust classifiers, while addressing out-of-distribution issues with an Irrelevant class. Key findings show most learner questions reside at the Knowledge level, with engagement patterns varying by cognitive level and subject; data augmentation yields mixed effects, and robustness is enhanced through the Irrelevant class and human-annotated evaluation. The work provides a publicly available dataset and a practical framework for building AI-assisted educational tools that analyze and respond to learner questions based on cognitive complexity, improving learning experiences and AI integration in education.
Abstract
Questioning is a fundamental aspect of education, as it helps assess students' understanding, promotes critical thinking, and encourages active engagement. With the rise of artificial intelligence in education, there is a growing interest in developing intelligent systems that can automatically generate and answer questions and facilitate interactions in both virtual and in-person education settings. However, to develop effective AI models for education, it is essential to have a fundamental understanding of questioning. In this study, we created the YouTube Learners' Questions on Bloom's Taxonomy Dataset (YouLeQD), which contains learner-posed questions from YouTube lecture video comments. Along with the dataset, we developed two RoBERTa-based classification models leveraging Large Language Models to detect questions and analyze their cognitive complexity using Bloom's Taxonomy. This dataset and our findings provide valuable insights into the cognitive complexity of learner-posed questions in educational videos and their relationship with interaction metrics. This can aid in the development of more effective AI models for education and improve the overall learning experience for students.
