Table of Contents
Fetching ...

EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision

Yiming Zhao, Taein Kwon, Paul Streli, Marc Pollefeys, Christian Holz

TL;DR

EgoPressure introduces the first large-scale egocentric hand pressure dataset with synchronized RGB-D data, per-contact pressure maps, and high-fidelity 3D hand meshes obtained via a marker-free, multi-view optimization pipeline. The authors show that incorporating accurate hand pose information improves pressure estimation from RGB images and demonstrate a novel UV-map–based pressure estimator (PressureFormer) that enables 3D pressure reconstruction on the hand surface. They establish two benchmarks: RGB-to-pressure estimation with and without hand pose, and joint hand pose–pressure estimation, providing strong baselines and a path toward integrated hand-object interaction understanding in egocentric vision. The dataset and methods pave the way for more realistic haptic-aware AR/VR and robotics, with potential extensions to non-planar objects and multi-hand interactions in the future.

Abstract

Touch contact and pressure are essential for understanding how humans interact with and manipulate objects, insights which can significantly benefit applications in mixed reality and robotics. However, estimating these interactions from an egocentric camera perspective is challenging, largely due to the lack of comprehensive datasets that provide both accurate hand poses on contacting surfaces and detailed annotations of pressure information. In this paper, we introduce EgoPressure, a novel egocentric dataset that captures detailed touch contact and pressure interactions. EgoPressure provides high-resolution pressure intensity annotations for each contact point and includes accurate hand pose meshes obtained through our proposed multi-view, sequence-based optimization method processing data from an 8-camera capture rig. Our dataset comprises 5 hours of recorded interactions from 21 participants captured simultaneously by one head-mounted and seven stationary Kinect cameras, which acquire RGB images and depth maps at 30 Hz. To support future research and benchmarking, we present several baseline models for estimating applied pressure on external surfaces from RGB images, with and without hand pose information. We further explore the joint estimation of the hand mesh and applied pressure. Our experiments demonstrate that pressure and hand pose are complementary for understanding hand-object interactions. ng of hand-object interactions in AR/VR and robotics research. Project page: \url{https://yiming-zhao.github.io/EgoPressure/}.

EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision

TL;DR

EgoPressure introduces the first large-scale egocentric hand pressure dataset with synchronized RGB-D data, per-contact pressure maps, and high-fidelity 3D hand meshes obtained via a marker-free, multi-view optimization pipeline. The authors show that incorporating accurate hand pose information improves pressure estimation from RGB images and demonstrate a novel UV-map–based pressure estimator (PressureFormer) that enables 3D pressure reconstruction on the hand surface. They establish two benchmarks: RGB-to-pressure estimation with and without hand pose, and joint hand pose–pressure estimation, providing strong baselines and a path toward integrated hand-object interaction understanding in egocentric vision. The dataset and methods pave the way for more realistic haptic-aware AR/VR and robotics, with potential extensions to non-planar objects and multi-hand interactions in the future.

Abstract

Touch contact and pressure are essential for understanding how humans interact with and manipulate objects, insights which can significantly benefit applications in mixed reality and robotics. However, estimating these interactions from an egocentric camera perspective is challenging, largely due to the lack of comprehensive datasets that provide both accurate hand poses on contacting surfaces and detailed annotations of pressure information. In this paper, we introduce EgoPressure, a novel egocentric dataset that captures detailed touch contact and pressure interactions. EgoPressure provides high-resolution pressure intensity annotations for each contact point and includes accurate hand pose meshes obtained through our proposed multi-view, sequence-based optimization method processing data from an 8-camera capture rig. Our dataset comprises 5 hours of recorded interactions from 21 participants captured simultaneously by one head-mounted and seven stationary Kinect cameras, which acquire RGB images and depth maps at 30 Hz. To support future research and benchmarking, we present several baseline models for estimating applied pressure on external surfaces from RGB images, with and without hand pose information. We further explore the joint estimation of the hand mesh and applied pressure. Our experiments demonstrate that pressure and hand pose are complementary for understanding hand-object interactions. ng of hand-object interactions in AR/VR and robotics research. Project page: \url{https://yiming-zhao.github.io/EgoPressure/}.
Paper Structure (49 sections, 8 equations, 36 figures, 10 tables, 1 algorithm)

This paper contains 49 sections, 8 equations, 36 figures, 10 tables, 1 algorithm.

Figures (36)

  • Figure 1: The EgoPressure dataset. We introduce a novel egocentric pressure dataset with hand poses. We label hand poses using our proposed optimization method across all static camera views (Cameras 1–7). The annotated hand mesh aligns well with the egocentric camera's view, indicating the high fidelity of our annotations. We project the pressure intensity and annotated hand mesh (Fig. i) to all camera views (Fig. a to h), and further provide the pressure applied over the hand as a UV texture map (Fig. j and k).
  • Figure 2: Method overview. The input for our annotation method consists of RGB-D images captured by 7 static Azure Kinect cameras and the pressure frame from a Sensel Morph touchpad. We leverage Segment-Anything sam and HaMeR hamer to obtain initial hand poses and masks. We refine the initial hand pose and shape estimates through differentiable rasterization dibr optimization across all static camera views. Using an additional virtual orthogonal camera placed below the touchpad, we reproject the captured pressure frame onto the hand mesh by optimizing the pressure as a texture feature of the corresponding UV map, while ensuring contact between the touchpad and all contact vertices.
  • Figure 3: 7 static + 1 egocentric camera rig
  • Figure 4: Camera pose tracking with IR makers
  • Figure 5: (a) t-SNE van2008visualizing visualization of hand pose frames $\theta$ over our dataset, with color coding for different gestures. All gestures are listed in Table \ref{['tab:gestures']} of the supplementary material. (b) Ratio of touch frames with contact for each vertex. (c) Maximum pressure over hand vertices across dataset. (d) Mean length of performed gestures. (e) Distribution of $\beta$ values across participants.
  • ...and 31 more figures