Table of Contents
Fetching ...

ToddlerAct: A Toddler Action Recognition Dataset for Gross Motor Development Assessment

Hsiang-Wei Huang, Jiacheng Sun, Cheng-Yen Yang, Zhongyu Jiang, Li-Yu Huang, Jenq-Neng Hwang, Yu-Ching Yeh

TL;DR

The paper addresses the lack of toddler-specific action recognition resources for assessing gross motor development. It introduces ToddlerAct, a real-world dataset with 495 videos, 556 tracklet-action pairs, and 70,235 frames across six gross motor actions, along with expert annotations and privacy protections. The authors benchmark both image-based (CLIP-based) and skeleton-based (ST-GCN, PoseConv3D) baselines, revealing that a full temporal MLPerceptron on CLIP features yields the best top-1 accuracy (67.6%), while zero-shot CLIP performs poorly (20.5%), and skeleton-based methods remain competitive due to accurate pose estimation. The work demonstrates the value of domain-specific data for toddler assessment, provides baselines for future research, and highlights ethical considerations and directions for improving robustness and longitudinal understanding in early development monitoring.

Abstract

Assessing gross motor development in toddlers is crucial for understanding their physical development and identifying potential developmental delays or disorders. However, existing datasets for action recognition primarily focus on adults, lacking the diversity and specificity required for accurate assessment in toddlers. In this paper, we present ToddlerAct, a toddler gross motor action recognition dataset, aiming to facilitate research in early childhood development. The dataset consists of video recordings capturing a variety of gross motor activities commonly observed in toddlers aged under three years old. We describe the data collection process, annotation methodology, and dataset characteristics. Furthermore, we benchmarked multiple state-of-the-art methods including image-based and skeleton-based action recognition methods on our datasets. Our findings highlight the importance of domain-specific datasets for accurate assessment of gross motor development in toddlers and lay the foundation for future research in this critical area. Our dataset will be available at https://github.com/ipl-uw/ToddlerAct.

ToddlerAct: A Toddler Action Recognition Dataset for Gross Motor Development Assessment

TL;DR

The paper addresses the lack of toddler-specific action recognition resources for assessing gross motor development. It introduces ToddlerAct, a real-world dataset with 495 videos, 556 tracklet-action pairs, and 70,235 frames across six gross motor actions, along with expert annotations and privacy protections. The authors benchmark both image-based (CLIP-based) and skeleton-based (ST-GCN, PoseConv3D) baselines, revealing that a full temporal MLPerceptron on CLIP features yields the best top-1 accuracy (67.6%), while zero-shot CLIP performs poorly (20.5%), and skeleton-based methods remain competitive due to accurate pose estimation. The work demonstrates the value of domain-specific data for toddler assessment, provides baselines for future research, and highlights ethical considerations and directions for improving robustness and longitudinal understanding in early development monitoring.

Abstract

Assessing gross motor development in toddlers is crucial for understanding their physical development and identifying potential developmental delays or disorders. However, existing datasets for action recognition primarily focus on adults, lacking the diversity and specificity required for accurate assessment in toddlers. In this paper, we present ToddlerAct, a toddler gross motor action recognition dataset, aiming to facilitate research in early childhood development. The dataset consists of video recordings capturing a variety of gross motor activities commonly observed in toddlers aged under three years old. We describe the data collection process, annotation methodology, and dataset characteristics. Furthermore, we benchmarked multiple state-of-the-art methods including image-based and skeleton-based action recognition methods on our datasets. Our findings highlight the importance of domain-specific datasets for accurate assessment of gross motor development in toddlers and lay the foundation for future research in this critical area. Our dataset will be available at https://github.com/ipl-uw/ToddlerAct.
Paper Structure (22 sections, 6 equations, 6 figures, 3 tables)

This paper contains 22 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Some visualization samples of the six different actions contain in our dataset. All the toddlers' faces are masked to protect privacy in our dataset.
  • Figure 2: Number of videos per action in our ToddlerAct dataset.
  • Figure 3: The overall pipeline of a simple CLIP-based action recognition baseline and the two different classification head designs. Both of the method utilized a frozen image encoder from CLIP. The first method incorporates a temporal pooling module (maximum pooling or average pooling) followed by a linear classifier, while the other incorporates a multi-layer perceptron to conduct action prediction.
  • Figure 4: A visualization example of our annotation. Our ToddlerAct dataset's annotation include video-level tracklet's bounding box annotation and the corresponding action. In the situation of multiple toddlers in a video, we include the tracking ID and corresponding action annotation.
  • Figure 5: Confusion matrix from each of the tested baseline, including CLIP & Multi-layer Perceptron (left), STGCN (center), and PoseConv3D (right).
  • ...and 1 more figures