A Humanoid Visual-Tactile-Action Dataset for Contact-Rich Manipulation
Eunju Kwon, Seungwon Oh, In-Chang Baek, Yucheon Park, Gyungbo Kim, JaeYoung Moon, Yunho Choi, Kyung-Joong Kim
TL;DR
The paper tackles contact-rich manipulation of deformable objects by leveraging a humanoid robot with multi-modal sensing to capture rich interaction signals. It introduces a dense visual-tactile-action dataset collected via teleoperation, totaling 101.9k frames across towel and sponge tasks under strong/weak pressure, with proprioception, egocentric vision, dense tactile maps from Inspire Hands, and tactile heatmaps. A neural fusion model based on dense tactile information demonstrates the utility and reveals optimization challenges associated with high-dimensional tactile inputs. Contributions include the first humanoid visual-tactile-action dataset for soft-object manipulation, a dense-tactile fusion architecture, and comprehensive data analysis guiding future dataset expansion and optimization strategies.
Abstract
Contact-rich manipulation has become increasingly important in robot learning. However, previous studies on robot learning datasets have focused on rigid objects and underrepresented the diversity of pressure conditions for real-world manipulation. To address this gap, we present a humanoid visual-tactile-action dataset designed for manipulating deformable soft objects. The dataset was collected via teleoperation using a humanoid robot equipped with dexterous hands, capturing multi-modal interactions under varying pressure conditions. This work also motivates future research on models with advanced optimization strategies capable of effectively leveraging the complexity and diversity of tactile signals.
