Table of Contents
Fetching ...

A Universal Action Space for General Behavior Analysis

Hung-Shuo Chang, Yue-Cheng Yang, Yu-Hsi Chen, Wei-Hsin Chen, Chien-Yao Wang, James C. Liao, Chien-Chang Chen, Hen-Hsen Huang, Hong-Yuan Mark Liao

TL;DR

This work addresses universal behavior analysis by learning a Universal Action Space (UAS) from large-scale human action data and using it as a frozen foundation for downstream animal-behavior tasks. The UAS is constructed with a Video Swin Transformer trained on Kinetics-600, producing a high-dimensional embedding that is projected into domain-specific subspaces for each task. Animal-behavior datasets (MammalNet and ChimpBehave) are analyzed via a lightweight linear probe atop the frozen UAS, achieving strong accuracy with substantial reductions in training time and parameters. The results demonstrate effective cross-domain transfer from human actions to animal behaviors and offer a resource-efficient approach for labs with limited compute; code is released on GitHub.

Abstract

Analyzing animal and human behavior has long been a challenging task in computer vision. Early approaches from the 1970s to the 1990s relied on hand-crafted edge detection, segmentation, and low-level features such as color, shape, and texture to locate objects and infer their identities-an inherently ill-posed problem. Behavior analysis in this era typically proceeded by tracking identified objects over time and modeling their trajectories using sparse feature points, which further limited robustness and generalization. A major shift occurred with the introduction of ImageNet by Deng and Li in 2010, which enabled large-scale visual recognition through deep neural networks and effectively served as a comprehensive visual dictionary. This development allowed object recognition to move beyond complex low-level processing toward learned high-level representations. In this work, we follow this paradigm to build a large-scale Universal Action Space (UAS) using existing labeled human-action datasets. We then use this UAS as the foundation for analyzing and categorizing mammalian and chimpanzee behavior datasets. The source code is released on GitHub at https://github.com/franktpmvu/Universal-Action-Space.

A Universal Action Space for General Behavior Analysis

TL;DR

This work addresses universal behavior analysis by learning a Universal Action Space (UAS) from large-scale human action data and using it as a frozen foundation for downstream animal-behavior tasks. The UAS is constructed with a Video Swin Transformer trained on Kinetics-600, producing a high-dimensional embedding that is projected into domain-specific subspaces for each task. Animal-behavior datasets (MammalNet and ChimpBehave) are analyzed via a lightweight linear probe atop the frozen UAS, achieving strong accuracy with substantial reductions in training time and parameters. The results demonstrate effective cross-domain transfer from human actions to animal behaviors and offer a resource-efficient approach for labs with limited compute; code is released on GitHub.

Abstract

Analyzing animal and human behavior has long been a challenging task in computer vision. Early approaches from the 1970s to the 1990s relied on hand-crafted edge detection, segmentation, and low-level features such as color, shape, and texture to locate objects and infer their identities-an inherently ill-posed problem. Behavior analysis in this era typically proceeded by tracking identified objects over time and modeling their trajectories using sparse feature points, which further limited robustness and generalization. A major shift occurred with the introduction of ImageNet by Deng and Li in 2010, which enabled large-scale visual recognition through deep neural networks and effectively served as a comprehensive visual dictionary. This development allowed object recognition to move beyond complex low-level processing toward learned high-level representations. In this work, we follow this paradigm to build a large-scale Universal Action Space (UAS) using existing labeled human-action datasets. We then use this UAS as the foundation for analyzing and categorizing mammalian and chimpanzee behavior datasets. The source code is released on GitHub at https://github.com/franktpmvu/Universal-Action-Space.
Paper Structure (17 sections, 3 figures, 3 tables)

This paper contains 17 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An overview of the Universal Action Space (UAS). We first derive features from complex human actions to build the UAS, and downstream tasks then utilize this feature space to construct their respective subspaces.
  • Figure 2: Heat map visualization of motion features captured by the Video Swin Transformer. Regions with higher motion intensity are shown in red.
  • Figure 3: Examples of mammalian behaviors included in the MammalNet dataset. This figure demonstrates the diversity of actions across different species.