CompdVision: Combining Near-Field 3D Visual and Tactile Sensing Using a Compact Compound-Eye Imaging System

Lifan Luo; Boyang Zhang; Zhijie Peng; Yik Kin Cheung; Guanlan Zhang; Zhigang Li; Michael Yu Wang; Hongyu Yu

CompdVision: Combining Near-Field 3D Visual and Tactile Sensing Using a Compact Compound-Eye Imaging System

Lifan Luo, Boyang Zhang, Zhijie Peng, Yik Kin Cheung, Guanlan Zhang, Zhigang Li, Michael Yu Wang, Hongyu Yu

TL;DR

CompdVision addresses the need for compact, multi-modal robotic sensing by integrating near-field 3D visual depth and tactile deformation sensing in a single compact, compound-eye system. It leverages dual-focus microlens arrays to enable simultaneous depth estimation via stereo vision and marker-tracking-based tactile sensing without modality conversion, using a 3×5 vision-unit grid that includes far-focus stereo units and near-focus tactile units. The depth estimation is calibrated with a small baseline and refined via SGBM and WLS, achieving high fill rates and low Z-accuracy errors, while tactile forces are inferred from marker displacements through a CNN trained on 16,000 samples, yielding accurate force predictions. The results demonstrate robust depth sensing and contact-force estimation in a compact form, enabling reliable manipulation in touch-intensive robotic tasks, with future work aimed at improving low-light performance and marker robustness.

Abstract

As automation technologies advance, the need for compact and multi-modal sensors in robotic applications is growing. To address this demand, we introduce CompdVision, a novel sensor that employs a compound-eye imaging system to combine near-field 3D visual and tactile sensing within a compact form factor. CompdVision utilizes two types of vision units to address diverse sensing needs, eliminating the need for complex modality conversion. Stereo units with far-focus lenses can see through the transparent elastomer for depth estimation beyond the contact surface. Simultaneously, tactile units with near-focus lenses track the movement of markers embedded in the elastomer to obtain contact deformation. Experimental results validate the sensor's superior performance in 3D visual and tactile sensing, proving its capability for reliable external object depth estimation and precise measurement of tangential and normal contact forces. The dual modalities and compact design make the sensor a versatile tool for robotic manipulation.

CompdVision: Combining Near-Field 3D Visual and Tactile Sensing Using a Compact Compound-Eye Imaging System

TL;DR

Abstract

Paper Structure (20 sections, 2 equations, 7 figures, 3 tables)

This paper contains 20 sections, 2 equations, 7 figures, 3 tables.

Introduction
Related work
Vision-Based Tactile Sensor
Sensor Combining Visual and Tactile Sensing
Sensor Design and Fabrication
Sensing Principle
3D Visual Modality
Tactile Modality
Image Stitching
Blob Detection
Marker Displacement Extraction
Simultaneous 3D Visual and Tactile Sensing
Experiments
Depth Estimation Calibration
Fill Rate
...and 5 more sections

Figures (7)

Figure 1: (a) Human thumb next to CompdVision. (b) CompdVision (red) is mounted on a gripper. (c) The compound-eye imaging system utilizes side stereo units with far-focus lenses for depth estimation, while the central tactile units, equipped with near-focus lenses, reduce external noise to enable precise tracking of marker movements for tactile sensing.
Figure 2: (a) Exploded view of the CompdVision sensor. (b) Prototype and dimensions of the sensor.
Figure 3: Scheme of stereo depth estimation: (a) Extraction of image tiles of the left stereo units. (b) Transposition and rectification of image tiles of the bottom and upper units to achieve horizontal alignment. (c) Application of the SGBM algorithm to generate the disparity and depth maps, with marker areas removed. (d) Final 3D reconstruction result derived from the left stereo units.
Figure 4: Scheme of tactile sensing: (a) An initial image is used to crop ROIs based on the centers of common markers. These cropped ROIs are then stitched together in the non-overlapping areas to form a single image. (b) Utilizing the same cropping regions, a stitched image is formed after contact. (c) The RGB image is converted to HSV color space and thresholded to isolate the markers, followed by using SimpleBlobDetector for marker localization. (d) The detected blob points are compared and matched with those from the initial image, establishing correspondences based on nearest neighbors.
Figure 5: The experiment setup of stereo units dataset collection.
...and 2 more figures

CompdVision: Combining Near-Field 3D Visual and Tactile Sensing Using a Compact Compound-Eye Imaging System

TL;DR

Abstract

CompdVision: Combining Near-Field 3D Visual and Tactile Sensing Using a Compact Compound-Eye Imaging System

Authors

TL;DR

Abstract

Table of Contents

Figures (7)