Table of Contents
Fetching ...

Extending 3D body pose estimation for robotic-assistive therapies of autistic children

Laura Santos, Bernardo Carvalho, Catarina Barata, José Santos-Victor

TL;DR

This work addresses accurate, non-intrusive 3D pose estimation for autistic children in robotic-assisted therapy by adapting a state-of-the-art CRMH-based 3D reconstruction pipeline to children. It personalizes the focal length input via a height-based regression, learned with RANSAC, to correct child-specific depth translation and enable both offline pose recovery and online interaction. In controlled experiments, the proposed CRMH-p achieves 3D errors below $0.3$ m, outperforming a BEV baseline, and in real therapy sessions it recovers skeletons missed by Kinect while maintaining competitive pose orientation accuracy. The approach offers a practical route to reliable pose estimation in occluded, unconstrained therapy settings, with future work aiming to optimize acquisition geometry and realize online deployment.

Abstract

Robotic-assistive therapy has demonstrated very encouraging results for children with Autism. Accurate estimation of the child's pose is essential both for human-robot interaction and for therapy assessment purposes. Non-intrusive methods are the sole viable option since these children are sensitive to touch. While depth cameras have been used extensively, existing methods face two major limitations: (i) they are usually trained with adult-only data and do not correctly estimate a child's pose, and (ii) they fail in scenarios with a high number of occlusions. Therefore, our goal was to develop a 3D pose estimator for children, by adapting an existing state-of-the-art 3D body modelling method and incorporating a linear regression model to fine-tune one of its inputs, thereby correcting the pose of children's 3D meshes. In controlled settings, our method has an error below $0.3m$, which is considered acceptable for this kind of application and lower than current state-of-the-art methods. In real-world settings, the proposed model performs similarly to a Kinect depth camera and manages to successfully estimate the 3D body poses in a much higher number of frames.

Extending 3D body pose estimation for robotic-assistive therapies of autistic children

TL;DR

This work addresses accurate, non-intrusive 3D pose estimation for autistic children in robotic-assisted therapy by adapting a state-of-the-art CRMH-based 3D reconstruction pipeline to children. It personalizes the focal length input via a height-based regression, learned with RANSAC, to correct child-specific depth translation and enable both offline pose recovery and online interaction. In controlled experiments, the proposed CRMH-p achieves 3D errors below m, outperforming a BEV baseline, and in real therapy sessions it recovers skeletons missed by Kinect while maintaining competitive pose orientation accuracy. The approach offers a practical route to reliable pose estimation in occluded, unconstrained therapy settings, with future work aiming to optimize acquisition geometry and realize online deployment.

Abstract

Robotic-assistive therapy has demonstrated very encouraging results for children with Autism. Accurate estimation of the child's pose is essential both for human-robot interaction and for therapy assessment purposes. Non-intrusive methods are the sole viable option since these children are sensitive to touch. While depth cameras have been used extensively, existing methods face two major limitations: (i) they are usually trained with adult-only data and do not correctly estimate a child's pose, and (ii) they fail in scenarios with a high number of occlusions. Therefore, our goal was to develop a 3D pose estimator for children, by adapting an existing state-of-the-art 3D body modelling method and incorporating a linear regression model to fine-tune one of its inputs, thereby correcting the pose of children's 3D meshes. In controlled settings, our method has an error below , which is considered acceptable for this kind of application and lower than current state-of-the-art methods. In real-world settings, the proposed model performs similarly to a Kinect depth camera and manages to successfully estimate the 3D body poses in a much higher number of frames.
Paper Structure (12 sections, 5 equations, 5 figures, 7 tables)

This paper contains 12 sections, 5 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Example of challenging position during therapy captured by a depth camera. The represented skeleton is a hybrid of the upper body of the therapist and the lower body of the child.
  • Figure 2: Kinect and CRMH outputs for an image with body occlusions. While the Kinect skeletons for the therapist and child are affected by the severe number of self and interpersonal occlusions, the CRMH model is able to effectively reconstruct the two skeletons.
  • Figure 3: Example of the CRMH problems during therapy sessions with children. Although in the front view (b), the meshes of the child and therapist are correctly identified, in (c) we observe an incorrect translation of child mesh. To account for the smaller dimension of the child, the mesh is reconstructed as an adult with an increased depth.
  • Figure 4: Skeletons’ scheme of the systems. The correspondence between joints used to evaluate are marked by circles of the same colour. Since the systems' skeletons are not a perfect match, for some joints (for example 14 in (a)) we had to use the mean of two associated markers.
  • Figure 5: Depth values for the child's hip joint using different systems: CRMH, BEV, CRMH-personalised (CRMH-p) and Optitrack. The proposed model improved the performance of the original CRMH. In the middle depth range ($1.9m<z<2.5m$) the accuracy of the proposed model is notably high.