Table of Contents
Fetching ...

From Actions to Kinesics: Extracting Human Psychological States through Bodily Movements

Cheyu Lin, Katherine A. Flanigan

TL;DR

A kinesics recognition framework that infers the communicative functions of human activity—known as kinesics—directly from 3D skeleton joint data by leveraging transfer learning to bypass the need for manually defined mappings between physical actions and psychological categories.

Abstract

Understanding the dynamic relationship between humans and the built environment is a key challenge in disciplines ranging from environmental psychology to reinforcement learning (RL). A central obstacle in modeling these interactions is the inability to capture human psychological states in a way that is both generalizable and privacy preserving. Traditional methods rely on theoretical models or questionnaires, which are limited in scope, static, and labor intensive. We present a kinesics recognition framework that infers the communicative functions of human activity -- known as kinesics -- directly from 3D skeleton joint data. Combining a spatial-temporal graph convolutional network (ST-GCN) with a convolutional neural network (CNN), the framework leverages transfer learning to bypass the need for manually defined mappings between physical actions and psychological categories. The approach preserves user anonymity while uncovering latent structures in bodily movements that reflect cognitive and emotional states. Our results on the Dyadic User EngagemenT (DUET) dataset demonstrate that this method enables scalable, accurate, and human-centered modeling of behavior, offering a new pathway for enhancing RL-driven simulations of human-environment interaction.

From Actions to Kinesics: Extracting Human Psychological States through Bodily Movements

TL;DR

A kinesics recognition framework that infers the communicative functions of human activity—known as kinesics—directly from 3D skeleton joint data by leveraging transfer learning to bypass the need for manually defined mappings between physical actions and psychological categories.

Abstract

Understanding the dynamic relationship between humans and the built environment is a key challenge in disciplines ranging from environmental psychology to reinforcement learning (RL). A central obstacle in modeling these interactions is the inability to capture human psychological states in a way that is both generalizable and privacy preserving. Traditional methods rely on theoretical models or questionnaires, which are limited in scope, static, and labor intensive. We present a kinesics recognition framework that infers the communicative functions of human activity -- known as kinesics -- directly from 3D skeleton joint data. Combining a spatial-temporal graph convolutional network (ST-GCN) with a convolutional neural network (CNN), the framework leverages transfer learning to bypass the need for manually defined mappings between physical actions and psychological categories. The approach preserves user anonymity while uncovering latent structures in bodily movements that reflect cognitive and emotional states. Our results on the Dyadic User EngagemenT (DUET) dataset demonstrate that this method enables scalable, accurate, and human-centered modeling of behavior, offering a new pathway for enhancing RL-driven simulations of human-environment interaction.

Paper Structure

This paper contains 2 figures, 2 tables.

Figures (2)

  • Figure 1: Sample frames for each interaction (class label is denoted in parentheses).
  • Figure 2: The kinesics recognition framework comprises skeleton data preparation, ST-GCN, and CNN. (Note: In "Skeleton Data,' double-bounded boxes represent dictionary keys, and each connecting line points to the corresponding value in the layer below.)