MuxGel: Simultaneous Dual-Modal Visuo-Tactile Sensing via Spatially Multiplexing and Deep Reconstruction

Zhixian Hu; Zhengtong Xu; Sheeraz Athar; Juan Wachs; Yu She

MuxGel: Simultaneous Dual-Modal Visuo-Tactile Sensing via Spatially Multiplexing and Deep Reconstruction

Zhixian Hu, Zhengtong Xu, Sheeraz Athar, Juan Wachs, Yu She

TL;DR

To recover full-resolution vision and tactile signals from the multiplexed inputs, this work develops a U-Net-based reconstruction framework and demonstrates MuxGel's utility in grasping tasks, where dual-modality feedback facilitates both pre-contact alignment and post-contact interaction.

Abstract

High-fidelity visuo-tactile sensing is important for precise robotic manipulation. However, most vision-based tactile sensors face a fundamental trade-off: opaque coatings enable tactile sensing but block pre-contact vision. To address this, we propose MuxGel, a spatially multiplexed sensor that captures both external visual information and contact-induced tactile signals through a single camera. By using a checkerboard coating pattern, MuxGel interleaves tactile-sensitive regions with transparent windows for external vision. This design maintains standard form factors, allowing for plug-and-play integration into GelSight-style sensors by simply replacing the gel pad. To recover full-resolution vision and tactile signals from the multiplexed inputs, we develop a U-Net-based reconstruction framework. Leveraging a sim-to-real pipeline, our model effectively decouples and restores high-fidelity tactile and visual fields simultaneously. Experiments on unseen objects demonstrate the framework's generalization and accuracy. Furthermore, we demonstrate MuxGel's utility in grasping tasks, where dual-modality feedback facilitates both pre-contact alignment and post-contact interaction. Results show that MuxGel enhances the perceptual capabilities of existing vision-based tactile sensors while maintaining compatibility with their hardware stacks. Project webpage: https://zhixianhu.github.io/muxgel/.

MuxGel: Simultaneous Dual-Modal Visuo-Tactile Sensing via Spatially Multiplexing and Deep Reconstruction

TL;DR

Abstract

Paper Structure (9 sections, 11 equations, 8 figures, 2 tables)

This paper contains 9 sections, 11 equations, 8 figures, 2 tables.

Introduction
Method
Hardware Design
Simulation Data Generation Pipeline
Reconstruction Framework
Real Data Collection and Model Adaptation
Performance Evaluation on Unseen Objects
Visual-Tactile Servoing Grasping Experiment
Conclusion

Figures (8)

Figure 1: Grasping a chip with MuxGel: (a) A Robotiq gripper grasping a chip using MuxGel for simultaneous visuo-tactile sensing via spatial multiplexing. (b) MuxGel with the 4x4 checkerboard configuration integrated into a GelSight Mini by replacing only the gel pad, without optical or mechanical redesign. (c) Raw multiplexed sensor output. (d) Reconstructed tactile image. (e) Reconstructed visual appearance image.
Figure 2: MuxGel configurations and raw observations. (a) A 4x4 sensor integrated with a Robotiq gripper during a grasp. (b)-(f) Hardware patches (left) and captured images (right) for pure vision, pure tactile, and multiplexed (2x2, 4x4, 8x8) configurations.
Figure 3: Large-scale physics-based simulation pipeline for visual-tactile data generation. Bg: Background; Obj: Object; Tac: Tactile; Ref: Reference.
Figure 4: Overview of the dual-stream muxNet architecture. A shared ResNet-34 encoder processes either a 3-channel fused image or a 6-channel tensor formed by concatenating the multiplexed image with a non-contact reference image. Two task-specific decoders reconstruct the visual and tactile modalities, respectively. The tactile branch supports absolute prediction (Option A) and residual prediction (Option B), where the predicted contact residual is added to the non-contact tactile image. Asterisks ($*$) denote activation functions specific to real-data fine-tuning. Chns: channels. BN: BatchNorm. Conv: Convolution.
Figure 5: Real-world data collection and dataset overview. (a) Automated 3-axis linear motion platform, where the x,y-axes align with the sensor contact plane for precise spatial sampling. (b) Indenter diversity. Top: five distinct geometries (sphere, edge, square, hollow octagon, solid octagon). Bottom: one geometry (hollow octagon) in five colors (white, black, red, green, blue) to improve visual robustness. Black scale bars correspond to 1 cm. (c) Representative data samples. Top: a white sphere indenter at 1.5 mm depth across five gel configurations. Bottom: a hollow octagon indenter in five colors captured with the 4x4 patterned gel pad at 1.5 mm depth.
...and 3 more figures

MuxGel: Simultaneous Dual-Modal Visuo-Tactile Sensing via Spatially Multiplexing and Deep Reconstruction

TL;DR

Abstract

MuxGel: Simultaneous Dual-Modal Visuo-Tactile Sensing via Spatially Multiplexing and Deep Reconstruction

Authors

TL;DR

Abstract

Table of Contents

Figures (8)