Table of Contents
Fetching ...

VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It

Xiaoxuan Zhu, Zhouhong Gu, Sihang Jiang, Zhixu Li, Hongwei Feng, Yanghua Xiao

TL;DR

This work addresses the challenge of automatic evaluation of educational video quality by reframing the task as a multi-target, multiple-choice QA problem and implementing VCEval, a framework that leverages text extraction from multimodal video content and an LLM-based evaluator. It introduces a K12 video-course benchmark, with data collection, annotation, and a three-phase training protocol (prior unlearning, in-class teaching, and in-class testing) to produce interpretable, target-aware quality scores. The framework demonstrates superior alignment with human judgments at both video and target levels, outperforming traditional text-similarity baselines and even strong ChatGPT baselines under practical input limitations. The proposed approach offers a scalable, interpretable, and fair method for guiding learners, creators, and platforms toward higher-quality video teaching materials, with potential impact on content curation and course design. Key contributions include the three-principle evaluation framework, the VCEval methodology, and the K12 benchmark with demonstrated consistency with human annotations.

Abstract

Online courses have significantly lowered the barrier to accessing education, yet the varying content quality of these videos poses challenges. In this work, we focus on the task of automatically evaluating the quality of video course content. We have constructed a dataset with a substantial collection of video courses and teaching materials. We propose three evaluation principles and design a new evaluation framework, \textit{VCEval}, based on these principles. The task is modeled as a multiple-choice question-answering task, with a language model serving as the evaluator. Our method effectively distinguishes video courses of different content quality and produces a range of interpretable results.

VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It

TL;DR

This work addresses the challenge of automatic evaluation of educational video quality by reframing the task as a multi-target, multiple-choice QA problem and implementing VCEval, a framework that leverages text extraction from multimodal video content and an LLM-based evaluator. It introduces a K12 video-course benchmark, with data collection, annotation, and a three-phase training protocol (prior unlearning, in-class teaching, and in-class testing) to produce interpretable, target-aware quality scores. The framework demonstrates superior alignment with human judgments at both video and target levels, outperforming traditional text-similarity baselines and even strong ChatGPT baselines under practical input limitations. The proposed approach offers a scalable, interpretable, and fair method for guiding learners, creators, and platforms toward higher-quality video teaching materials, with potential impact on content curation and course design. Key contributions include the three-principle evaluation framework, the VCEval methodology, and the K12 benchmark with demonstrated consistency with human annotations.

Abstract

Online courses have significantly lowered the barrier to accessing education, yet the varying content quality of these videos poses challenges. In this work, we focus on the task of automatically evaluating the quality of video course content. We have constructed a dataset with a substantial collection of video courses and teaching materials. We propose three evaluation principles and design a new evaluation framework, \textit{VCEval}, based on these principles. The task is modeled as a multiple-choice question-answering task, with a language model serving as the evaluator. Our method effectively distinguishes video courses of different content quality and produces a range of interpretable results.
Paper Structure (33 sections, 13 equations, 5 figures, 4 tables)

This paper contains 33 sections, 13 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Top: Existing automated online course evaluation mainly focuses on the video attribute and video topic, failing to evaluate the video content's clarity in elucidating knowledge. Bottom: Our proposed framework for automated evaluation of the teaching content in online courses.
  • Figure 2: Framework of our proposed VCEval. VCEval is composed of three main components, which are all detailed in Sec. \ref{['sec:method']}: Dataset Preparation in Sec. \ref{['sec:dp']}, Take Lessons in Sec. \ref{['sec:tl']}, Take Exams in Sec. \ref{['sec:te']}.
  • Figure 3: Data 1: An example of a video series. A series contains several videos on a certain subject and each video teaches certain knowledge units. Data 2: The collected teaching material with relevant teaching targets. Data 3: An example of human annotation.
  • Figure 4: Test accuracy for each knowledge unit of two geography series.
  • Figure 5: A case study for the result in Figure \ref{['fig:case-bar']} about Knowlede Unit Id 2, where the probabilities in prediction $z_{i}$ are probability of option A, B, C, D respectively. The evidences are collected from the video transcripts.