Point Cloud-based Grasping for Soft Hand Exoskeleton
Chen Hu, Enrica Tricomi, Eojin Rho, Daekyum Kim, Lorenzo Masia, Shan Luo, Letizia Gionfrida
TL;DR
This work tackles assistive grasping with a tendon-driven soft hand exoskeleton by introducing a depth-based geometric vision framework that predicts grasp targets from 3D point clouds and drives PID-based activation. It uses PROSAC for plane fitting, DBSCAN for clustering, and PCA for object centroids to build a relational graph and select the target object, enabling a fast, data-efficient controller that achieves a GAS of $91\pm2\%$ across 15 objects and 10 participants. Compared with data-driven vision methods, the proposed approach demonstrates superior generalization to unseen objects with $RSR$ around $94.29\%$ seen and $92.50\%$ unseen, and real-time performance at about $10.72$ fps, while also improving finger kinematics. The results suggest robust, interpretable perception-driven control suitable for embedded deployment, with future directions including dynamic grasp strength modulation and event-based vision to further enhance responsiveness and safety.
Abstract
Grasping is a fundamental skill for interacting with and manipulating objects in the environment. However, this ability can be challenging for individuals with hand impairments. Soft hand exoskeletons designed to assist grasping can enhance or restore essential hand functions, yet controlling these soft exoskeletons to support users effectively remains difficult due to the complexity of understanding the environment. This study presents a vision-based predictive control framework that leverages contextual awareness from depth perception to predict the grasping target and determine the next control state for activation. Unlike data-driven approaches that require extensive labelled datasets and struggle with generalizability, our method is grounded in geometric modelling, enabling robust adaptation across diverse grasping scenarios. The Grasping Ability Score (GAS) was used to evaluate performance, with our system achieving a state-of-the-art GAS of 91% across 15 objects and healthy participants, demonstrating its effectiveness across different object types. The proposed approach maintained reconstruction success for unseen objects, underscoring its enhanced generalizability compared to learning-based models.
