Table of Contents
Fetching ...

Leveraging GCN-based Action Recognition for Teleoperation in Daily Activity Assistance

Thomas M. Kwok, Jiaan Li, Yue Hu

TL;DR

The paper addresses the challenge of remote caregiving for older adults by removing the need for direct motion synchronization in teleoperation. It introduces a simplified spatio-temporal graph convolutional network (S-ST-GCN) that recognizes caregiver actions from RGB-skeleton-object data and maps them to preset robot trajectories, with a finite-state machine (FSM) to filter misclassifications. Experiments show that a 40-frame moving window yields near 90% action recognition accuracy, robust performance to unseen utensils, and reliable teleoperation with measurable delays that are mitigated by continuous recognition. This approach reduces operator fatigue, simplifies setup by avoiding markers or precise calibration, and holds promise for ADL assistance with potential future integration of advanced motion planning and user studies on telepresence and usability.

Abstract

Caregiving of older adults is an urgent global challenge, with many older adults preferring to age in place rather than enter residential care. However, providing adequate home-based assistance remains difficult, particularly in geographically vast regions. Teleoperated robots offer a promising solution, but conventional motion-mapping teleoperation imposes unnatural movement constraints on operators, leading to muscle fatigue and reduced usability. This paper presents a novel teleoperation framework that leverages action recognition to enable intuitive remote robot control. Using our simplified Spatio-Temporal Graph Convolutional Network (S-ST-GCN), the system recognizes human actions and executes corresponding preset robot trajectories, eliminating the need for direct motion synchronization. A finite-state machine (FSM) is integrated to enhance reliability by filtering out misclassified actions. Our experiments demonstrate that the proposed framework enables effortless operator movement while ensuring accurate robot execution. This proof-of-concept study highlights the potential of teleoperation with action recognition for enabling caregivers to remotely assist older adults during activities of daily living (ADLs). Future work will focus on improving the S-ST-GCN's recognition accuracy and generalization, integrating advanced motion planning techniques to further enhance robotic autonomy in older adult care, and conducting a user study to evaluate the system's telepresence and ease of control.

Leveraging GCN-based Action Recognition for Teleoperation in Daily Activity Assistance

TL;DR

The paper addresses the challenge of remote caregiving for older adults by removing the need for direct motion synchronization in teleoperation. It introduces a simplified spatio-temporal graph convolutional network (S-ST-GCN) that recognizes caregiver actions from RGB-skeleton-object data and maps them to preset robot trajectories, with a finite-state machine (FSM) to filter misclassifications. Experiments show that a 40-frame moving window yields near 90% action recognition accuracy, robust performance to unseen utensils, and reliable teleoperation with measurable delays that are mitigated by continuous recognition. This approach reduces operator fatigue, simplifies setup by avoiding markers or precise calibration, and holds promise for ADL assistance with potential future integration of advanced motion planning and user studies on telepresence and usability.

Abstract

Caregiving of older adults is an urgent global challenge, with many older adults preferring to age in place rather than enter residential care. However, providing adequate home-based assistance remains difficult, particularly in geographically vast regions. Teleoperated robots offer a promising solution, but conventional motion-mapping teleoperation imposes unnatural movement constraints on operators, leading to muscle fatigue and reduced usability. This paper presents a novel teleoperation framework that leverages action recognition to enable intuitive remote robot control. Using our simplified Spatio-Temporal Graph Convolutional Network (S-ST-GCN), the system recognizes human actions and executes corresponding preset robot trajectories, eliminating the need for direct motion synchronization. A finite-state machine (FSM) is integrated to enhance reliability by filtering out misclassified actions. Our experiments demonstrate that the proposed framework enables effortless operator movement while ensuring accurate robot execution. This proof-of-concept study highlights the potential of teleoperation with action recognition for enabling caregivers to remotely assist older adults during activities of daily living (ADLs). Future work will focus on improving the S-ST-GCN's recognition accuracy and generalization, integrating advanced motion planning techniques to further enhance robotic autonomy in older adult care, and conducting a user study to evaluate the system's telepresence and ease of control.

Paper Structure

This paper contains 12 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: The simplified spatio-temporal Graph Convolutional Network (S-ST-GCN) model for action recognition. The 'Stab' action with a moving window of 0.2s is used as an illustrative example.
  • Figure 2: Graph preparation for each frame. (a) Detected pose landmarks and objects. (b) Structures of the spatio-temporal graph. For example, black nodes and edges depict the spatial graph in the i-th frame, while purple nodes and edges represent the graph in the subsequent frame. Yellow lines indicate the temporal edges linking these two frames.
  • Figure 3: An illustrative example of moving windows with fixed window sizes of 4 frames in a 20 fps video.
  • Figure 4: A proposed teleoperation framework using spatio-temporal action recognition. It includes (a) capturing the operator’s movement over a moving window of frames ($W_t$) as an action command, (b) detecting nodes (i.e., pose ($V_p$) and object ($V_o$)) of frames in the moving window ($V_w$) for action recognition, and (c) executing the recognized action ($a_h$) on the robot with visual feedback to the operator.
  • Figure 5: Setup of the teleoperation framework: (a) operator side and (b) remote robot side.
  • ...and 4 more figures