Multimodal Visual-haptic pose estimation in the presence of transient occlusion

Michael Zechmair; Yannick Morel

Multimodal Visual-haptic pose estimation in the presence of transient occlusion

Michael Zechmair, Yannick Morel

TL;DR

This work tackles reliable human pose estimation for safe human-robot collaboration in the presence of occlusion. It fuses two perception modalities—a vision pipeline based on Predictive Coding (ProcNet) and a compact capacitive haptic sensor—via a modified nonlinear Luenberger observer to achieve occlusion-robust localization. Key contributions include: (i) a ProcNet-based segmentation and 3D-pose estimation framework, (ii) a near-range capacitive sensor system with a neural network mapping to pose, (iii) an adaptive, noise-weighted fusion scheme using $R_v$ and $R_c$ to combine vision and haptics, and (iv) numerical simulations showing improved accuracy over single modalities under varying occlusion. The approach enhances safety and reliability for cobots operating in close human proximity by maintaining accurate pose estimates despite transient occlusion, with potential extensions to multi-target tracking and richer haptic pose modeling.

Abstract

Human-robot collaboration requires the establishment of methods to guarantee the safety of participating operators. A necessary part of this process is ensuring reliable human pose estimation. Established vision-based modalities encounter problems when under conditions of occlusion. This article describes the combination of two perception modalities for pose estimation in environments containing such transient occlusion. We first introduce a vision-based pose estimation method, based on a deep Predictive Coding (PC) model featuring robustness to partial occlusion. Next, capacitive sensing hardware capable of detecting various objects is introduced. The sensor is compact enough to be mounted on the exterior of any given robotic system. The technology is particularly well-suited to detection of capacitive material, such as living tissue. Pose estimation from the two individual sensing modalities is combined using a modified Luenberger observer model. We demonstrate that the results offer better performance than either sensor alone. The efficacy of the system is demonstrated on an environment containing a robot arm and a human, showing the ability to estimate the pose of a human forearm under varying levels of occlusion.

Multimodal Visual-haptic pose estimation in the presence of transient occlusion

TL;DR

and

to combine vision and haptics, and (iv) numerical simulations showing improved accuracy over single modalities under varying occlusion. The approach enhances safety and reliability for cobots operating in close human proximity by maintaining accurate pose estimates despite transient occlusion, with potential extensions to multi-target tracking and richer haptic pose modeling.

Abstract

Paper Structure (16 sections, 19 equations, 15 figures, 1 table)

This paper contains 16 sections, 19 equations, 15 figures, 1 table.

Introduction
Perception Modalities
Active Electric Field Modality
Perception Strategy
Experimental Results
Sensor Model
Haptic-based Pose Estimation
Visual Modality
Perception Strategy
PredNet Model
Vision-based Pose Estimation
Multimodal Localization
Modified Luenberger-type Observer
Integration of Perception Methods
Numerical Simulation
...and 1 more sections

Figures (15)

Figure 1: Integration of electrode board with signal processing hardware (left) and diagram (right) consisting of one excitation electrode (middle, light blue ring) and five measurement electrodes (dark blue areas, center, left, right, top, bottom).
Figure 2: Haptic sensing measures recorded when considering a human hand (left) and forearm (right). The sensor was moved across the horizontal plane, with electrodes above the target at a constant height of 1cm.
Figure 3: Measures obtained with the haptic sensors above various considered objects. The robot arm was moved so that distance $d(t)$ between electrodes and object ranged from 0.5 to 20cm. The smooth lines are generated by (\ref{['eq:sensor_approx']}).
Figure 4: Pose estimation error in relation to distance $d$ from haptic sensor as RMSE (Root-Mean-Square-Error).
Figure 5: Neural network using robot joint angles and sensor measures to generate object pose estimation. It contains four fully connected layers, each comprised of 256 ReLU units.
...and 10 more figures

Multimodal Visual-haptic pose estimation in the presence of transient occlusion

TL;DR

Abstract

Multimodal Visual-haptic pose estimation in the presence of transient occlusion

Authors

TL;DR

Abstract

Table of Contents

Figures (15)