Table of Contents
Fetching ...

Read the Room: Inferring Social Context Through Dyadic Interaction Recognition in Cyber-physical-social Infrastructure Systems

Cheyu Lin, John Martins, Katherine A. Flanigan, Ph. D

TL;DR

This paper advances cyber-physical-social infrastructure systems (CPSIS) by focusing on social benefits and privacy-preserving measurement of human interactions within infrastructure. It introduces a dyadic interaction dataset drawn from a five-category taxonomy and benchmarks five skeleton-based recognition models on 12 dyadic interactions, finding ConvLSTM to be the most effective for capturing spatiotemporal social cues. The study reveals that depth-sensor–based skeleton data can enable robust dyadic interaction recognition even under occlusion, suggesting practical pathways for integrating social objectives into CPSIS. The work lays groundwork for mapping social interactions to social benefits, with potential applications in healthcare, smart spaces, and autonomous systems, while highlighting the need for deeper understanding of social meanings and more efficient implementations.

Abstract

Cyber-physical systems (CPS) integrate sensing, computing, and control to improve infrastructure performance, focusing on economic goals like performance and safety. However, they often neglect potential human-centered (or ''social'') benefits. Cyber-physical-social infrastructure systems (CPSIS) aim to address this by aligning CPS with social objectives. This involves defining social benefits, understanding human interactions with each other and infrastructure, developing privacy-preserving measurement methods, modeling these interactions for prediction, linking them to social benefits, and actuating the physical environment to foster positive social outcomes. This paper delves into recognizing dyadic human interactions using real-world data, which is the backbone to measuring social behavior. This lays a foundation to address the need to enhance understanding of the deeper meanings and mutual responses inherent in human interactions. While RGB cameras are informative for interaction recognition, privacy concerns arise. Depth sensors offer a privacy-conscious alternative by analyzing skeletal movements. This study compares five skeleton-based interaction recognition algorithms on a dataset of 12 dyadic interactions. Unlike single-person datasets, these interactions, categorized into communication types like emblems and affect displays, offer insights into the cultural and emotional aspects of human interactions.

Read the Room: Inferring Social Context Through Dyadic Interaction Recognition in Cyber-physical-social Infrastructure Systems

TL;DR

This paper advances cyber-physical-social infrastructure systems (CPSIS) by focusing on social benefits and privacy-preserving measurement of human interactions within infrastructure. It introduces a dyadic interaction dataset drawn from a five-category taxonomy and benchmarks five skeleton-based recognition models on 12 dyadic interactions, finding ConvLSTM to be the most effective for capturing spatiotemporal social cues. The study reveals that depth-sensor–based skeleton data can enable robust dyadic interaction recognition even under occlusion, suggesting practical pathways for integrating social objectives into CPSIS. The work lays groundwork for mapping social interactions to social benefits, with potential applications in healthcare, smart spaces, and autonomous systems, while highlighting the need for deeper understanding of social meanings and more efficient implementations.

Abstract

Cyber-physical systems (CPS) integrate sensing, computing, and control to improve infrastructure performance, focusing on economic goals like performance and safety. However, they often neglect potential human-centered (or ''social'') benefits. Cyber-physical-social infrastructure systems (CPSIS) aim to address this by aligning CPS with social objectives. This involves defining social benefits, understanding human interactions with each other and infrastructure, developing privacy-preserving measurement methods, modeling these interactions for prediction, linking them to social benefits, and actuating the physical environment to foster positive social outcomes. This paper delves into recognizing dyadic human interactions using real-world data, which is the backbone to measuring social behavior. This lays a foundation to address the need to enhance understanding of the deeper meanings and mutual responses inherent in human interactions. While RGB cameras are informative for interaction recognition, privacy concerns arise. Depth sensors offer a privacy-conscious alternative by analyzing skeletal movements. This study compares five skeleton-based interaction recognition algorithms on a dataset of 12 dyadic interactions. Unlike single-person datasets, these interactions, categorized into communication types like emblems and affect displays, offer insights into the cultural and emotional aspects of human interactions.

Paper Structure

This paper contains 7 sections, 1 equation, 6 figures.

Figures (6)

  • Figure 1: Situation of work within CPSIS annaswamy_control_nodate.
  • Figure 2: Sample depth data of 12 interactions. From left to right and top to bottom: Waving in, thumbs-up, waving, pointing, showing measurements, hugging, laughing, arm crossing, nodding, writing circles in the air, holding palms out, and twirling or scratching hair.
  • Figure 3: The testbed includes (a) the Azure Kinect 221cm above ground and tilted 37 degrees downward, (b) a 218.5cm $\times$ 180.5cm interaction area that is set 262cm away from the sensing module to ensure full view, in (c) an open indoor testbed location.
  • Figure 4: The features extracted from skeletons are (a) 32 joints, (b) angles defined by three joints, and (c) distances that connect two joints.
  • Figure 5: Overall structure of the deep learning models, which includes the input layer (depends on the considered model), model-specific layers, two fully connected layers containing 32 and 12 neurons respectively, and the output layer with softmax activation function.
  • ...and 1 more figures