Table of Contents
Fetching ...

PressureVision++: Estimating Fingertip Pressure from Diverse RGB Images

Patrick Grady, Jeremy A. Collins, Chengcheng Tang, Christopher D. Twigg, Kunal Aneja, James Hays, Charles C. Kemp

TL;DR

PressureVision++ introduces a weak-label supervised approach to estimate fingertip pressure from RGB images, enabling learning from diverse, uninstrumented surfaces via prompts that specify contact. By jointly predicting per-pixel pressure and contact labels, and employing adversarial domain adaptation, the model achieves robust performance across textures and geometries, outperforming prior vision-based methods and human annotators. The authors collect ContactLabelDB with 51 participants and 2.9M frames, and demonstrate MR applications where everyday surfaces become touch-sensitive interfaces, including a surface-drawing tool and a touch-typing keyboard. The work provides extensive data, code, and models, highlighting the practicality of non-invasive visual pressure sensing for real-world handheld interactions.

Abstract

Touch plays a fundamental role in manipulation for humans; however, machine perception of contact and pressure typically requires invasive sensors. Recent research has shown that deep models can estimate hand pressure based on a single RGB image. However, evaluations have been limited to controlled settings since collecting diverse data with ground-truth pressure measurements is difficult. We present a novel approach that enables diverse data to be captured with only an RGB camera and a cooperative participant. Our key insight is that people can be prompted to apply pressure in a certain way, and this prompt can serve as a weak label to supervise models to perform well under varied conditions. We collect a novel dataset with 51 participants making fingertip contact with diverse objects. Our network, PressureVision++, outperforms human annotators and prior work. We also demonstrate an application of PressureVision++ to mixed reality where pressure estimation allows everyday surfaces to be used as arbitrary touch-sensitive interfaces. Code, data, and models are available online.

PressureVision++: Estimating Fingertip Pressure from Diverse RGB Images

TL;DR

PressureVision++ introduces a weak-label supervised approach to estimate fingertip pressure from RGB images, enabling learning from diverse, uninstrumented surfaces via prompts that specify contact. By jointly predicting per-pixel pressure and contact labels, and employing adversarial domain adaptation, the model achieves robust performance across textures and geometries, outperforming prior vision-based methods and human annotators. The authors collect ContactLabelDB with 51 participants and 2.9M frames, and demonstrate MR applications where everyday surfaces become touch-sensitive interfaces, including a surface-drawing tool and a touch-typing keyboard. The work provides extensive data, code, and models, highlighting the practicality of non-invasive visual pressure sensing for real-world handheld interactions.

Abstract

Touch plays a fundamental role in manipulation for humans; however, machine perception of contact and pressure typically requires invasive sensors. Recent research has shown that deep models can estimate hand pressure based on a single RGB image. However, evaluations have been limited to controlled settings since collecting diverse data with ground-truth pressure measurements is difficult. We present a novel approach that enables diverse data to be captured with only an RGB camera and a cooperative participant. Our key insight is that people can be prompted to apply pressure in a certain way, and this prompt can serve as a weak label to supervise models to perform well under varied conditions. We collect a novel dataset with 51 participants making fingertip contact with diverse objects. Our network, PressureVision++, outperforms human annotators and prior work. We also demonstrate an application of PressureVision++ to mixed reality where pressure estimation allows everyday surfaces to be used as arbitrary touch-sensitive interfaces. Code, data, and models are available online.
Paper Structure (38 sections, 7 equations, 12 figures, 9 tables)

This paper contains 38 sections, 7 equations, 12 figures, 9 tables.

Figures (12)

  • Figure 2: Instrumenting surfaces with pressure sensors without altering their properties is challenging. For example, pressure sensors must be transparent in order to instrument glass, and must be stretchable in order to instrument a deformable mat.
  • Figure 3: We represent the contact labels as a six-dimensional vector. The first five elements are binary values indicating which fingers are prompted to be in contact, and the last element indicates the prompted force level. Fully labeled data has both pressure and contact labels, while weakly labeled data has only contact labels. Participants with a range of genders and skin tones were recruited for our study.
  • Figure 4: PressureVision++ architecture. First, hand crops are generated using the bounding boxes estimated by an off-the-shelf hand detector. The crops are passed into an encoder-decoder network to estimate pressure for each pixel in the input image. Two classification heads are attached to the bottleneck of the network; one is trained to estimate the contact label, and the other uses an adversarial loss to reduce the shift between fully labeled and weakly labeled domains.
  • Figure 5: Results on the fully labeled test set. The baseline column is PressureVision++ trained without either the domain loss or contact label loss. The bottom row shows a common failure mode where pressure is not estimated for occluded parts of the hand.
  • Figure 6: Results on the surfaces in the weakly labeled test set, none of which are included in the fully labeled training set. PressureVision++ produces qualitatively accurate results on highly textured, curved, and compliant surfaces. All images except the top row are zoomed in to show detail. The bottom right shows a failure case where pressure is underestimated on an object.
  • ...and 7 more figures