Table of Contents
Fetching ...

ViTaMIn-B: A Reliable and Efficient Visuo-Tactile Bimanual Manipulation Interface

Chuanyu Li, Chaoyi Liu, Daotan Wang, Shuyu Zhang, Lusong Li, Zecui Zeng, Fangchen Liu, Jing Xu, Rui Chen

TL;DR

ViTaMIn-B introduces DuoTact, a compliant visuo-tactile handheld sensor, and a point-cloud deformation representation, addressing drift-prone SLAM tracking and cross-sensor generalization. It uses Meta Quest 3 controllers for unified 6-DoF bimanual pose tracking and latency-compensated multi-modal synchronization, enabling robust data collection of bimanual demonstrations without robot hardware. Through four tasks and ablation studies, tactile sensing improves success rates and the point-cloud input shows strong cross-sensor robustness, with novices able to collect high-quality data efficiently. The work closes the gap between low-cost handheld data collection and high-fidelity, multimodal demonstrations, with release plans for design files.

Abstract

Handheld devices have opened up unprecedented opportunities to collect large-scale, high-quality demonstrations efficiently. However, existing systems often lack robust tactile sensing or reliable pose tracking to handle complex interaction scenarios, especially for bimanual and contact-rich tasks. In this work, we propose ViTaMIn-B, a more capable and efficient handheld data collection system for such tasks. We first design DuoTact, a novel compliant visuo-tactile sensor built with a flexible frame to withstand large contact forces during manipulation while capturing high-resolution contact geometry. To enhance the cross-sensor generalizability, we propose reconstructing the sensor's global deformation as a 3D point cloud and using it as the policy input. We further develop a robust, unified 6-DoF bimanual pose acquisition process using Meta Quest controllers, which eliminates the trajectory drift issue in common SLAM-based methods. Comprehensive user studies confirm the efficiency and high usability of ViTaMIn-B among novice and expert operators. Furthermore, experiments on four bimanual manipulation tasks demonstrate its superior task performance relative to existing systems. Project page: https://chuanyune.github.io/ViTaMIn-B_page/

ViTaMIn-B: A Reliable and Efficient Visuo-Tactile Bimanual Manipulation Interface

TL;DR

ViTaMIn-B introduces DuoTact, a compliant visuo-tactile handheld sensor, and a point-cloud deformation representation, addressing drift-prone SLAM tracking and cross-sensor generalization. It uses Meta Quest 3 controllers for unified 6-DoF bimanual pose tracking and latency-compensated multi-modal synchronization, enabling robust data collection of bimanual demonstrations without robot hardware. Through four tasks and ablation studies, tactile sensing improves success rates and the point-cloud input shows strong cross-sensor robustness, with novices able to collect high-quality data efficiently. The work closes the gap between low-cost handheld data collection and high-fidelity, multimodal demonstrations, with release plans for design files.

Abstract

Handheld devices have opened up unprecedented opportunities to collect large-scale, high-quality demonstrations efficiently. However, existing systems often lack robust tactile sensing or reliable pose tracking to handle complex interaction scenarios, especially for bimanual and contact-rich tasks. In this work, we propose ViTaMIn-B, a more capable and efficient handheld data collection system for such tasks. We first design DuoTact, a novel compliant visuo-tactile sensor built with a flexible frame to withstand large contact forces during manipulation while capturing high-resolution contact geometry. To enhance the cross-sensor generalizability, we propose reconstructing the sensor's global deformation as a 3D point cloud and using it as the policy input. We further develop a robust, unified 6-DoF bimanual pose acquisition process using Meta Quest controllers, which eliminates the trajectory drift issue in common SLAM-based methods. Comprehensive user studies confirm the efficiency and high usability of ViTaMIn-B among novice and expert operators. Furthermore, experiments on four bimanual manipulation tasks demonstrate its superior task performance relative to existing systems. Project page: https://chuanyune.github.io/ViTaMIn-B_page/

Paper Structure

This paper contains 23 sections, 4 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Exploded view of DuoTact structure.
  • Figure 2: Diagram of the fabrication process for DuoTact
  • Figure 3: Principle and result diagram of point cloud reconstruction. In Figure A, red lines represent inner edges of the frame. In Figure B, red lines represent detected inner edges in the photo, while the blue circles denote the corner points on the sensor top. Figure C shows the reconstruction effect.
  • Figure 4: Hardware composition of the ViTaMIn-B handheld device. The system integrates a Quest controller for bimanual pose tracking, DuoTact sensors for tactile sensing, and modular mechanical parts for stable and ergonomic bimanual operation.
  • Figure 5: The relationship between frames used in the transformation calibration process. Arrows in the picture denote transforms between frames, where the red one denotes the desired hand-eye transform ${}^{Q}T_{EE}$.
  • ...and 3 more figures