Table of Contents
Fetching ...

Autonomous robotic re-alignment for face-to-face underwater human-robot interaction

Demetrious T. Kutzke, Ashwin Wariar, Junaed Sattar

TL;DR

This paper tackles underwater human–robot interaction by enabling autonomous face-to-face reorientation using a stereo vision pipeline that reconstructs 3D diver pose from nonstandard poses. A DeepLabCut-based 2D torso pose estimator is trained on a large stereo-diver dataset to triangulate keypoints, from which a diver-centered body frame is built and aligned with the camera using the Kabsch algorithm, with an anti-alignment rotation $R_y(\pi)$ to produce a scale-preserving setpoint $s^{*}$ for visual servo control. Experimental results from pool and open-water trials demonstrate that the method yields reasonable projection errors for 1–3 m baselines, with occlusion and depth-ambiguity posing notable challenges. The work contributes a scale-preserving setpoint computation, a diverse stereo torso keypoint dataset, and a pose-to-coordinate convention that advances autonomous UHRI face-to-face communication, with future refinements including ADR-based depth regularization and context-aware scene understanding.

Abstract

The use of autonomous underwater vehicles (AUVs) to accomplish traditionally challenging and dangerous tasks has proliferated thanks to advances in sensing, navigation, manipulation, and on-board computing technologies. Utilizing AUVs in underwater human-robot interaction (UHRI) has witnessed comparatively smaller levels of growth due to limitations in bi-directional communication and significant technical hurdles to bridge the gap between analogies with terrestrial interaction strategies and those that are possible in the underwater domain. A necessary component to support UHRI is establishing a system for safe robotic-diver approach to establish face-to-face communication that considers non-standard human body pose. In this work, we introduce a stereo vision system for enhancing UHRI that utilizes three-dimensional reconstruction from stereo image pairs and machine learning for localizing human joint estimates. We then establish a convention for a coordinate system that encodes the direction the human is facing with respect to the camera coordinate frame. This allows automatic setpoint computation that preserves human body scale and can be used as input to an image-based visual servo control scheme. We show that our setpoint computations tend to agree both quantitatively and qualitatively with experimental setpoint baselines. The methodology introduced shows promise for enhancing UHRI by improving robotic perception of human orientation underwater.

Autonomous robotic re-alignment for face-to-face underwater human-robot interaction

TL;DR

This paper tackles underwater human–robot interaction by enabling autonomous face-to-face reorientation using a stereo vision pipeline that reconstructs 3D diver pose from nonstandard poses. A DeepLabCut-based 2D torso pose estimator is trained on a large stereo-diver dataset to triangulate keypoints, from which a diver-centered body frame is built and aligned with the camera using the Kabsch algorithm, with an anti-alignment rotation to produce a scale-preserving setpoint for visual servo control. Experimental results from pool and open-water trials demonstrate that the method yields reasonable projection errors for 1–3 m baselines, with occlusion and depth-ambiguity posing notable challenges. The work contributes a scale-preserving setpoint computation, a diverse stereo torso keypoint dataset, and a pose-to-coordinate convention that advances autonomous UHRI face-to-face communication, with future refinements including ADR-based depth regularization and context-aware scene understanding.

Abstract

The use of autonomous underwater vehicles (AUVs) to accomplish traditionally challenging and dangerous tasks has proliferated thanks to advances in sensing, navigation, manipulation, and on-board computing technologies. Utilizing AUVs in underwater human-robot interaction (UHRI) has witnessed comparatively smaller levels of growth due to limitations in bi-directional communication and significant technical hurdles to bridge the gap between analogies with terrestrial interaction strategies and those that are possible in the underwater domain. A necessary component to support UHRI is establishing a system for safe robotic-diver approach to establish face-to-face communication that considers non-standard human body pose. In this work, we introduce a stereo vision system for enhancing UHRI that utilizes three-dimensional reconstruction from stereo image pairs and machine learning for localizing human joint estimates. We then establish a convention for a coordinate system that encodes the direction the human is facing with respect to the camera coordinate frame. This allows automatic setpoint computation that preserves human body scale and can be used as input to an image-based visual servo control scheme. We show that our setpoint computations tend to agree both quantitatively and qualitatively with experimental setpoint baselines. The methodology introduced shows promise for enhancing UHRI by improving robotic perception of human orientation underwater.
Paper Structure (6 sections, 3 equations, 8 figures, 1 table)

This paper contains 6 sections, 3 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Underwater human robot interaction is enhanced by the robot's ability to re-orient itself with respect to the diver, rather than requiring the diver to re-orient with respect to the robot.
  • Figure 2: Example non-standard diver poses that are typical during scuba diving operations. Diver robot interaction scenarios must accommodate these poses to be useful for underwater missions.
  • Figure 3: Pose keypoint convention used by F2F along with a sample labeled image. The pose estimator may provide more anatomical keypoints, but F2F only requires these six.
  • Figure 4: Sample raw images from the F2F pose dataset. The dataset contains diverse poses to represent the broadest possible set of orientations a diver can assume while conducting underwater operations.
  • Figure 5: DeepLabCut evaluation on the F2F test dataset. Cross markers indicate ground truth labels, and dots indicate DeepLabCut estimations with confidences $p > p_{\text{cutoff}} = 0.05$.
  • ...and 3 more figures