Table of Contents
Fetching ...

EvoStruggle: A Dataset Capturing the Evolution of Struggle across Activities and Skill Levels

Shijia Feng, Michael Wray, Walterio Mayol-Cuevas

TL;DR

This work addresses the challenge of detecting how struggle evolves during skill acquisition by introducing EvoStruggle, a large-scale, multi-activity dataset with precise temporal struggle annotations. The dataset comprises 61.68 hours of untrimmed video, 2,793 videos, and 5,385 labeled struggle segments from 76 participants performing 18 tasks across four activities, each repeated five times to capture skill progression. Temporal Action Localization models are evaluated as struggle detectors, achieving a mean average precision of 34.56% across tasks and 19.24% across activities, demonstrating transferability of struggle cues while highlighting generalization challenges. EvoStruggle provides a rich resource for developing robust, generalizable assistance and tutoring systems that adapt as learners evolve.

Abstract

The ability to determine when a person struggles during skill acquisition is crucial for both optimizing human learning and enabling the development of effective assistive systems. As skills develop, the type and frequency of struggles tend to change, and understanding this evolution is key to determining the user's current stage of learning. However, existing manipulation datasets have not focused on how struggle evolves over time. In this work, we collect a dataset for struggle determination, featuring 61.68 hours of video recordings, 2,793 videos, and 5,385 annotated temporal struggle segments collected from 76 participants. The dataset includes 18 tasks grouped into four diverse activities -- tying knots, origami, tangram puzzles, and shuffling cards, representing different task variations. In addition, participants repeated the same task five times to capture their evolution of skill. We define the struggle determination problem as a temporal action localization task, focusing on identifying and precisely localizing struggle segments with start and end times. Experimental results show that Temporal Action Localization models can successfully learn to detect struggle cues, even when evaluated on unseen tasks or activities. The models attain an overall average mAP of 34.56% when generalizing across tasks and 19.24% across activities, indicating that struggle is a transferable concept across various skill-based tasks while still posing challenges for further improvement in struggle detection. Our dataset is available at https://github.com/FELIXFENG2019/EvoStruggle.

EvoStruggle: A Dataset Capturing the Evolution of Struggle across Activities and Skill Levels

TL;DR

This work addresses the challenge of detecting how struggle evolves during skill acquisition by introducing EvoStruggle, a large-scale, multi-activity dataset with precise temporal struggle annotations. The dataset comprises 61.68 hours of untrimmed video, 2,793 videos, and 5,385 labeled struggle segments from 76 participants performing 18 tasks across four activities, each repeated five times to capture skill progression. Temporal Action Localization models are evaluated as struggle detectors, achieving a mean average precision of 34.56% across tasks and 19.24% across activities, demonstrating transferability of struggle cues while highlighting generalization challenges. EvoStruggle provides a rich resource for developing robust, generalizable assistance and tutoring systems that adapt as learners evolve.

Abstract

The ability to determine when a person struggles during skill acquisition is crucial for both optimizing human learning and enabling the development of effective assistive systems. As skills develop, the type and frequency of struggles tend to change, and understanding this evolution is key to determining the user's current stage of learning. However, existing manipulation datasets have not focused on how struggle evolves over time. In this work, we collect a dataset for struggle determination, featuring 61.68 hours of video recordings, 2,793 videos, and 5,385 annotated temporal struggle segments collected from 76 participants. The dataset includes 18 tasks grouped into four diverse activities -- tying knots, origami, tangram puzzles, and shuffling cards, representing different task variations. In addition, participants repeated the same task five times to capture their evolution of skill. We define the struggle determination problem as a temporal action localization task, focusing on identifying and precisely localizing struggle segments with start and end times. Experimental results show that Temporal Action Localization models can successfully learn to detect struggle cues, even when evaluated on unseen tasks or activities. The models attain an overall average mAP of 34.56% when generalizing across tasks and 19.24% across activities, indicating that struggle is a transferable concept across various skill-based tasks while still posing challenges for further improvement in struggle detection. Our dataset is available at https://github.com/FELIXFENG2019/EvoStruggle.

Paper Structure

This paper contains 18 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Overview of the EvoStruggle dataset. There are four activities: Tying Knots, Origami, Tangrams, and Shuffling Cards, each further consisting of 4/5 distinct tasks (left). Each task has five repetitions that show the evolution of skill (right, top to bottom). Percentages indicate the proportion of struggle duration relative to the total video recording time.
  • Figure 2: EvoStruggle Dataset Structure. There are four activities: Tying Knots, Origami, Tangram, and Shuffling Cards, each with 4--5 tasks. Each participant completed all tasks from an activity across five attempts to showcase an evolution of skill.
  • Figure 3: Our Struggle Annotation Pipeline consists of two stages. First, annotators watch the video and indicate moments whenever they believe the person is struggling. In the second stage, we cluster these moments into contiguous start/end times.
  • Figure 4: Heatmaps showing the Kernel Density Estimation (KDE) of struggle instance distributions. The x-axis/y-axis represents the normalized start/duration time of struggle relative to the total video recording time.
  • Figure 5: Number of struggle moments per video (left) and total recording time with struggle durations per attempt (right) in EvoStruggle.
  • ...and 5 more figures