Multimodal Visual-haptic pose estimation in the presence of transient occlusion
Michael Zechmair, Yannick Morel
TL;DR
This work tackles reliable human pose estimation for safe human-robot collaboration in the presence of occlusion. It fuses two perception modalities—a vision pipeline based on Predictive Coding (ProcNet) and a compact capacitive haptic sensor—via a modified nonlinear Luenberger observer to achieve occlusion-robust localization. Key contributions include: (i) a ProcNet-based segmentation and 3D-pose estimation framework, (ii) a near-range capacitive sensor system with a neural network mapping to pose, (iii) an adaptive, noise-weighted fusion scheme using $R_v$ and $R_c$ to combine vision and haptics, and (iv) numerical simulations showing improved accuracy over single modalities under varying occlusion. The approach enhances safety and reliability for cobots operating in close human proximity by maintaining accurate pose estimates despite transient occlusion, with potential extensions to multi-target tracking and richer haptic pose modeling.
Abstract
Human-robot collaboration requires the establishment of methods to guarantee the safety of participating operators. A necessary part of this process is ensuring reliable human pose estimation. Established vision-based modalities encounter problems when under conditions of occlusion. This article describes the combination of two perception modalities for pose estimation in environments containing such transient occlusion. We first introduce a vision-based pose estimation method, based on a deep Predictive Coding (PC) model featuring robustness to partial occlusion. Next, capacitive sensing hardware capable of detecting various objects is introduced. The sensor is compact enough to be mounted on the exterior of any given robotic system. The technology is particularly well-suited to detection of capacitive material, such as living tissue. Pose estimation from the two individual sensing modalities is combined using a modified Luenberger observer model. We demonstrate that the results offer better performance than either sensor alone. The efficacy of the system is demonstrated on an environment containing a robot arm and a human, showing the ability to estimate the pose of a human forearm under varying levels of occlusion.
