Table of Contents
Fetching ...

Fast Low-parameter Video Activity Localization in Collaborative Learning Environments

Venkatesh Jatla, Sravani Teeparthi, Ugesh Egala, Sylvia Celedon Pattichis, Marios S. Patticis

TL;DR

This paper targets long-duration video activity localization in collaborative learning environments by developing a fast, low-parameter, modular system that detects, associates with individuals, and visualizes typing and writing activities in AOLME classroom videos. The approach combines a video activity segment proposal network with separable, low-parameter 3D-CNN classifiers for typing and writing, supported by fast object tracking (Faster-RCNN for hands/keyboard and KCF projection for hands) and a two-pass labeling strategy to train on limited data. A key contribution is the interactive web-based activity map that links 3-second activity proposals to video timestamps, enabling long-term analysis across sessions and groups. The system achieves state-of-the-art-like performance with orders-of-magnitude fewer parameters (as low as 18.7K) and high inference speeds (up to several thousand frames per second), while providing practical tools for education researchers to study engagement and collaboration in real classroom settings.

Abstract

Research on video activity detection has primarily focused on identifying well-defined human activities in short video segments. The majority of the research on video activity recognition is focused on the development of large parameter systems that require training on large video datasets. This paper develops a low-parameter, modular system with rapid inferencing capabilities that can be trained entirely on limited datasets without requiring transfer learning from large-parameter systems. The system can accurately detect and associate specific activities with the students who perform the activities in real-life classroom videos. Additionally, the paper develops an interactive web-based application to visualize human activity maps over long real-life classroom videos.

Fast Low-parameter Video Activity Localization in Collaborative Learning Environments

TL;DR

This paper targets long-duration video activity localization in collaborative learning environments by developing a fast, low-parameter, modular system that detects, associates with individuals, and visualizes typing and writing activities in AOLME classroom videos. The approach combines a video activity segment proposal network with separable, low-parameter 3D-CNN classifiers for typing and writing, supported by fast object tracking (Faster-RCNN for hands/keyboard and KCF projection for hands) and a two-pass labeling strategy to train on limited data. A key contribution is the interactive web-based activity map that links 3-second activity proposals to video timestamps, enabling long-term analysis across sessions and groups. The system achieves state-of-the-art-like performance with orders-of-magnitude fewer parameters (as low as 18.7K) and high inference speeds (up to several thousand frames per second), while providing practical tools for education researchers to study engagement and collaboration in real classroom settings.

Abstract

Research on video activity detection has primarily focused on identifying well-defined human activities in short video segments. The majority of the research on video activity recognition is focused on the development of large parameter systems that require training on large video datasets. This paper develops a low-parameter, modular system with rapid inferencing capabilities that can be trained entirely on limited datasets without requiring transfer learning from large-parameter systems. The system can accurately detect and associate specific activities with the students who perform the activities in real-life classroom videos. Additionally, the paper develops an interactive web-based application to visualize human activity maps over long real-life classroom videos.
Paper Structure (38 sections, 2 equations, 32 figures, 13 tables)

This paper contains 38 sections, 2 equations, 32 figures, 13 tables.

Figures (32)

  • Figure 1: Spatio-temporal activity detection by SOTA on standard human activity detection dataset.
  • Figure 2: Typing and writing activities and expected visualization. The interactive activity map with the activities associated with the person helps the user to get a better understanding of the detected activities.
  • Figure 3: This figure displays sample activities from standard datasets. From the figure it is clear that labeled activity is the main focus of the video.
  • Figure 4: Core components of standard activity recognition methods that capture temporal characteristics.
  • Figure 5: Core components of standard activity recognition methods that capture temporal characteristics.
  • ...and 27 more figures