YouLeQD: Decoding the Cognitive Complexity of Questions and Engagement in Online Educational Videos from Learners' Perspectives

Nong Ming; Sachin Sharma; Jiho Noh

YouLeQD: Decoding the Cognitive Complexity of Questions and Engagement in Online Educational Videos from Learners' Perspectives

Nong Ming, Sachin Sharma, Jiho Noh

TL;DR

The paper tackles understanding cognitive complexity in learner questions from online educational videos by constructing YouLeQD, a large-scale dataset of 57,242 questions drawn from YouTube comments across five STEM subjects. It combines RoBERTa-based question detection with Bloom's Taxonomy classification, leveraging LLM-powered data augmentation (GPT-4o) and knowledge distillation to train robust classifiers, while addressing out-of-distribution issues with an Irrelevant class. Key findings show most learner questions reside at the Knowledge level, with engagement patterns varying by cognitive level and subject; data augmentation yields mixed effects, and robustness is enhanced through the Irrelevant class and human-annotated evaluation. The work provides a publicly available dataset and a practical framework for building AI-assisted educational tools that analyze and respond to learner questions based on cognitive complexity, improving learning experiences and AI integration in education.

Abstract

Questioning is a fundamental aspect of education, as it helps assess students' understanding, promotes critical thinking, and encourages active engagement. With the rise of artificial intelligence in education, there is a growing interest in developing intelligent systems that can automatically generate and answer questions and facilitate interactions in both virtual and in-person education settings. However, to develop effective AI models for education, it is essential to have a fundamental understanding of questioning. In this study, we created the YouTube Learners' Questions on Bloom's Taxonomy Dataset (YouLeQD), which contains learner-posed questions from YouTube lecture video comments. Along with the dataset, we developed two RoBERTa-based classification models leveraging Large Language Models to detect questions and analyze their cognitive complexity using Bloom's Taxonomy. This dataset and our findings provide valuable insights into the cognitive complexity of learner-posed questions in educational videos and their relationship with interaction metrics. This can aid in the development of more effective AI models for education and improve the overall learning experience for students.

YouLeQD: Decoding the Cognitive Complexity of Questions and Engagement in Online Educational Videos from Learners' Perspectives

TL;DR

Abstract

Paper Structure (28 sections, 3 equations, 3 figures, 8 tables)

This paper contains 28 sections, 3 equations, 3 figures, 8 tables.

Introduction
Related Work
YouTube as a Source of Educational Content
Questioning in Education and Evaluation Methods
LLM as Automatic Annotators
Methodology
Data Acquisition: Transcripts and Comments
Question Extraction from Comments
Bloom's Taxonomy Classification Model for Questions
Data augmentation
Detecting Out-of-Distribution Examples and Evaluation Strategy
Experiments and Results
Question Extraction
Alignment of Questions with Bloom's Taxonomy
Distribution of BT Cognitive Levels across Subjects
...and 13 more sections

Figures (3)

Figure 1: Confusion Matrix of Human Annotators and Model Prediction (Human labels are aggregated and agreed upon by 3 annotators)
Figure 2: Question Type Distribution by Cognitive Complexity Across 5 Subjects
Figure 3: Popularity and Interaction Rate vs. Bloom's Taxonomy on Cognitive Level

YouLeQD: Decoding the Cognitive Complexity of Questions and Engagement in Online Educational Videos from Learners' Perspectives

TL;DR

Abstract

YouLeQD: Decoding the Cognitive Complexity of Questions and Engagement in Online Educational Videos from Learners' Perspectives

Authors

TL;DR

Abstract

Table of Contents

Figures (3)