ManiFeel: Benchmarking and Understanding Visuotactile Manipulation Policy Learning
Quan Khanh Luu, Pokuang Zhou, Zhengtong Xu, Zhiyuan Zhang, Qiang Qiu, Yu She
TL;DR
ManiFeel addresses the lack of standardized benchmarks for visuotactile policy learning by introducing a scalable simulation benchmark with a 13-task suite spanning Insertion, Screwing, and Exploration, plus a modular evaluation pipeline for sensing modalities, tactile representations, and policy architectures. Through extensive sim-to-real experiments, it demonstrates that $TacFF$ enhances force-sensitive, contact-rich manipulation while $TacRGB$ supports texture-based perception under visually degraded conditions, revealing task-dependent modality strengths. The work analyzes design principles and sim-to-real correspondence, and provides open-source code, datasets, and pretrained checkpoints to accelerate future visuotactile policy development. By offering a reproducible and extensible platform, ManiFeel aims to catalyze progress toward robust, generalizable visuotactile robotic manipulation.
Abstract
Supervised visuomotor policies have shown strong performance in robotic manipulation but often struggle in tasks with limited visual inputs, such as operations in confined spaces and dimly lit environments, or tasks requiring precise perception of object properties and environmental interactions. In such cases, tactile feedback becomes essential for manipulation. While the rapid progress of supervised visuomotor policies has benefited greatly from high-quality, reproducible simulation benchmarks in visual imitation, the visuotactile domain still lacks a similarly comprehensive and reliable benchmark for large-scale and rigorous evaluation. To address this, we introduce ManiFeel, a reproducible and scalable simulation benchmark designed to systematically study supervised visuotactile policy learning. ManiFeel offers a diverse suite of contact-rich and visually challenging manipulation tasks, a modular evaluation pipeline spanning sensing modalities, tactile representations, and policy architectures, as well as real-world validation. Through extensive experiments, ManiFeel demonstrates how tactile sensing enhances policy performance across diverse manipulation scenarios, ranging from precise contact-driven operations to visually constrained settings. In addition, the results reveal task-dependent strengths of different tactile modalities and identify key design principles and open challenges for robust visuotactile policy learning. Real-world evaluations further confirm that ManiFeel provides a reliable and meaningful foundation for benchmarking and future visuotactile policy development. To foster reproducibility and future research, we will release our codebase, datasets, training logs, and pretrained checkpoints, aiming to accelerate progress toward generalizable visuotactile policy learning and manipulation.
