Table of Contents
Fetching ...

Point Cloud-based Grasping for Soft Hand Exoskeleton

Chen Hu, Enrica Tricomi, Eojin Rho, Daekyum Kim, Lorenzo Masia, Shan Luo, Letizia Gionfrida

TL;DR

This work tackles assistive grasping with a tendon-driven soft hand exoskeleton by introducing a depth-based geometric vision framework that predicts grasp targets from 3D point clouds and drives PID-based activation. It uses PROSAC for plane fitting, DBSCAN for clustering, and PCA for object centroids to build a relational graph and select the target object, enabling a fast, data-efficient controller that achieves a GAS of $91\pm2\%$ across 15 objects and 10 participants. Compared with data-driven vision methods, the proposed approach demonstrates superior generalization to unseen objects with $RSR$ around $94.29\%$ seen and $92.50\%$ unseen, and real-time performance at about $10.72$ fps, while also improving finger kinematics. The results suggest robust, interpretable perception-driven control suitable for embedded deployment, with future directions including dynamic grasp strength modulation and event-based vision to further enhance responsiveness and safety.

Abstract

Grasping is a fundamental skill for interacting with and manipulating objects in the environment. However, this ability can be challenging for individuals with hand impairments. Soft hand exoskeletons designed to assist grasping can enhance or restore essential hand functions, yet controlling these soft exoskeletons to support users effectively remains difficult due to the complexity of understanding the environment. This study presents a vision-based predictive control framework that leverages contextual awareness from depth perception to predict the grasping target and determine the next control state for activation. Unlike data-driven approaches that require extensive labelled datasets and struggle with generalizability, our method is grounded in geometric modelling, enabling robust adaptation across diverse grasping scenarios. The Grasping Ability Score (GAS) was used to evaluate performance, with our system achieving a state-of-the-art GAS of 91% across 15 objects and healthy participants, demonstrating its effectiveness across different object types. The proposed approach maintained reconstruction success for unseen objects, underscoring its enhanced generalizability compared to learning-based models.

Point Cloud-based Grasping for Soft Hand Exoskeleton

TL;DR

This work tackles assistive grasping with a tendon-driven soft hand exoskeleton by introducing a depth-based geometric vision framework that predicts grasp targets from 3D point clouds and drives PID-based activation. It uses PROSAC for plane fitting, DBSCAN for clustering, and PCA for object centroids to build a relational graph and select the target object, enabling a fast, data-efficient controller that achieves a GAS of across 15 objects and 10 participants. Compared with data-driven vision methods, the proposed approach demonstrates superior generalization to unseen objects with around seen and unseen, and real-time performance at about fps, while also improving finger kinematics. The results suggest robust, interpretable perception-driven control suitable for embedded deployment, with future directions including dynamic grasp strength modulation and event-based vision to further enhance responsiveness and safety.

Abstract

Grasping is a fundamental skill for interacting with and manipulating objects in the environment. However, this ability can be challenging for individuals with hand impairments. Soft hand exoskeletons designed to assist grasping can enhance or restore essential hand functions, yet controlling these soft exoskeletons to support users effectively remains difficult due to the complexity of understanding the environment. This study presents a vision-based predictive control framework that leverages contextual awareness from depth perception to predict the grasping target and determine the next control state for activation. Unlike data-driven approaches that require extensive labelled datasets and struggle with generalizability, our method is grounded in geometric modelling, enabling robust adaptation across diverse grasping scenarios. The Grasping Ability Score (GAS) was used to evaluate performance, with our system achieving a state-of-the-art GAS of 91% across 15 objects and healthy participants, demonstrating its effectiveness across different object types. The proposed approach maintained reconstruction success for unseen objects, underscoring its enhanced generalizability compared to learning-based models.

Paper Structure

This paper contains 19 sections, 11 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: System design: (a) A vision-based controller was developed for the study. The controller reconstructs the 3D point cloud from depth frames using the camera's intrinsic parameters. Neighbourhood density is calculated as a confidence measure to sort the point cloud, and the largest planar model (table) and the convex hull (objects) in the contextual perception are identified using PROSAC chum2005matching. The category of each point in the point cloud is determined using DBSCAN Ester1996DBSCAN. Principal Component Analysis (PCA) Pearson1901PCA is applied to compute the centroid of each object category. These centroids are used to generate a relationship graph among objects, and the centroid closest to the camera's optical axis is identified as the target object. When the distance between this object and the camera plane is less than an adaptable threshold $\tau$, velocity PID control is triggered to assist the user in completing the grasping task. Force-sensing mode: a force-sensitive resistor (FSR) sensor mounted at the fingertip of the hand exoskeleton measures pressure values; a grip command is triggered when the pressure from either sensor exceeds a defined threshold. Push-button mode: commands are transmitted via the corresponding button presses. (b) The soft hand exoskeleton, based on an existing design rho2021learning, consists of three main components: an embedded actuator, a customized soft exoskeleton that transfers force to the finger joints, and a sensing module. The actuation system is mounted on a shin guard worn on the forearm for optimized weight distribution. Additionally, 3D-printed nails and rings are designed and installed near the metacarpophalangeal, proximal interphalangeal, and distal interphalangeal joints to replicate flexion tendons passing through the joints.
  • Figure 2: The three grip types, including pinch, spherical grip, and cylindrical grip, are used to grasp 15 objects.
  • Figure 3: Grasping Ability Scores (GAS) in percentage (%) for 10 users when grasping 15 objects across three control modes: push-button mode (orange), force-sensing mode (blue), and vision-based mode (purple) are displayed in the bar chart. The table below shows the distances (mm) between the tips of the thumb, index finger, middle finger, and wrist. Longer fingers correlate with higher GAS. Significance levels are indicated by asterisks (*), where * denotes p $\leq$ 0.05, ** denotes p $\leq$ 0.01, and *** denotes p $\leq$ 0.001.
  • Figure 4: Grasping Ability Scores (GAS) for 15 objects across three grasping modes: the push-button mode (orange), force-sensing mode (blue), and vision-based mode (purple), displayed in a bar chart. The table shows the parameters of 15 objects, including mass, dimensions, and material to illustrate their influence on the GAS. Significance levels are indicated by asterisks (*), where * denotes p $\leq$ 0.05, ** denotes p $\leq$ 0.01, and *** denotes p $\leq$ 0.001.
  • Figure 5: Confusion matrix depicting the average Grasping Ability Scores (GAS) across 10 users for 15 objects evaluated in three modes: push-button, force-sensing, and vision. The intensity of the colour reflects the GAS, with darker blue indicating higher scores and lighter yellow indicating lower scores.
  • ...and 2 more figures