Quantifying the Academic Quality of Children's Videos using Machine Comprehension

Sumeet Kumar; Mallikarjuna T.; Ashiqur Khudabukhsh

Quantifying the Academic Quality of Children's Videos using Machine Comprehension

Sumeet Kumar, Mallikarjuna T., Ashiqur Khudabukhsh

TL;DR

This work tackles the gap in objective assessment of children's YouTube videos by linking video content to middle-school textbook questions. It introduces a three-component pipeline comprising a dense multi-modal video retriever, a Longformer-based reading-comprehension model, and a neural/embedding-based multiple-choice extractor to estimate how many textbook questions a video can support. The authors validate their approach on a large YouTube Kids corpus and two QA datasets, showing that top channels can answer a substantial fraction of textbook questions through multi-modal cues (approximately $0.78$ of the questions). They analyze channel-level differences, reveal correlations with popularity, and discuss limitations related to curricular scope, arguing for broader applicability to other subjects and platforms.

Abstract

YouTube Kids (YTK) is one of the most popular kids' applications used by millions of kids daily. However, various studies have highlighted concerns about the videos on the platform, like the over-presence of entertaining and commercial content. YouTube recently proposed high-quality guidelines that include `promoting learning' and proposed to use it in ranking channels. However, the concept of learning is multi-faceted, and it can be difficult to define and measure in the context of online videos. This research focuses on learning in terms of what's taught in schools and proposes a way to measure the academic quality of children's videos. Using a new dataset of questions and answers from children's videos, we first show that a Reading Comprehension (RC) model can estimate academic learning. Then, using a large dataset of middle school textbook questions on diverse topics, we quantify the academic quality of top channels as the number of children's textbook questions that an RC model can correctly answer. By analyzing over 80,000 videos posted on the top 100 channels, we present the first thorough analysis of the academic quality of channels on YTK.

Quantifying the Academic Quality of Children's Videos using Machine Comprehension

TL;DR

of the questions). They analyze channel-level differences, reveal correlations with popularity, and discuss limitations related to curricular scope, arguing for broader applicability to other subjects and platforms.

Abstract

Paper Structure (31 sections, 12 equations, 9 figures, 4 tables)

This paper contains 31 sections, 12 equations, 9 figures, 4 tables.

Introduction
Related Work
Concerns with YouTube Kids Videos
Reading Comprehension
Video Retrieval
Methodology
Problem Formulation
Multi-modal Video Retriever Model
Reading Comprehension Model for Generating Answers
Input Embedding Block
Attention Block
Output Block
Multiple-Choice Answer Extraction Model
Neural Network for Multiple Choice (NNMC)
Closest Language Embedding Model (CLEM)
...and 16 more sections

Figures (9)

Figure 1: A video-frame from a YouTube Kids video explaining the Solar System. Our paper attempts to quantify the academic quality of videos on the basis of visual and language content.
Figure 2: Proposed approach to estimate the academic quality of videos vis-a-vis questions and answers in children's textbooks. The proposed approach combines a multi-modal video retriever, a reading comprehension (RC) model, and an answer extraction (Multiple Choice AE) model.
Figure 3: Multi-modal video ranking model, + indicates concatenation of video transcript embeddings and video frame encodings.
Figure 4: Reading comprehension model with global and sliding attention windows.
Figure 5: Comparing RC models of varying passage length
...and 4 more figures

Quantifying the Academic Quality of Children's Videos using Machine Comprehension

TL;DR

Abstract

Quantifying the Academic Quality of Children's Videos using Machine Comprehension

Authors

TL;DR

Abstract

Table of Contents

Figures (9)