ViTaMIn-B: A Reliable and Efficient Visuo-Tactile Bimanual Manipulation Interface
Chuanyu Li, Chaoyi Liu, Daotan Wang, Shuyu Zhang, Lusong Li, Zecui Zeng, Fangchen Liu, Jing Xu, Rui Chen
TL;DR
ViTaMIn-B introduces DuoTact, a compliant visuo-tactile handheld sensor, and a point-cloud deformation representation, addressing drift-prone SLAM tracking and cross-sensor generalization. It uses Meta Quest 3 controllers for unified 6-DoF bimanual pose tracking and latency-compensated multi-modal synchronization, enabling robust data collection of bimanual demonstrations without robot hardware. Through four tasks and ablation studies, tactile sensing improves success rates and the point-cloud input shows strong cross-sensor robustness, with novices able to collect high-quality data efficiently. The work closes the gap between low-cost handheld data collection and high-fidelity, multimodal demonstrations, with release plans for design files.
Abstract
Handheld devices have opened up unprecedented opportunities to collect large-scale, high-quality demonstrations efficiently. However, existing systems often lack robust tactile sensing or reliable pose tracking to handle complex interaction scenarios, especially for bimanual and contact-rich tasks. In this work, we propose ViTaMIn-B, a more capable and efficient handheld data collection system for such tasks. We first design DuoTact, a novel compliant visuo-tactile sensor built with a flexible frame to withstand large contact forces during manipulation while capturing high-resolution contact geometry. To enhance the cross-sensor generalizability, we propose reconstructing the sensor's global deformation as a 3D point cloud and using it as the policy input. We further develop a robust, unified 6-DoF bimanual pose acquisition process using Meta Quest controllers, which eliminates the trajectory drift issue in common SLAM-based methods. Comprehensive user studies confirm the efficiency and high usability of ViTaMIn-B among novice and expert operators. Furthermore, experiments on four bimanual manipulation tasks demonstrate its superior task performance relative to existing systems. Project page: https://chuanyune.github.io/ViTaMIn-B_page/
