Table of Contents
Fetching ...

A Vision-Based Shared-Control Teleoperation Scheme for Controlling the Robotic Arm of a Four-Legged Robot

Murilo Vinicius da Silva, Matheus Hipolito Carvalho, Juliano Negri, Thiago Segreto, Gustavo J. G. Lahr, Ricardo V. Godoy, Marcelo Becker

TL;DR

This work tackles the challenge of teleoperating a quadruped robot with a manipulation arm in hazardous environments by introducing a vision-based, shared-control interface that maps the operator's wrist pose and hand orientation to the robot end-effector. It combines depth-camera perception, ArUco-based calibration, MediaPipe tracking, and gesture recognition to enable manual and semi-autonomous grasping modes, while a collision-aware planner ensures safe operation. The approach is implemented on a Boston Dynamics Spot platform and validated through both simulation and real-time experiments, achieving a mean wrist-to-end-effector error of approximately 0.07 m and successful pick-and-place tasks by two users. The results suggest a practical, low-cost alternative for high-risk industrial applications, with future work focusing on path planning and enhanced shared-control strategies.

Abstract

In hazardous and remote environments, robotic systems perform critical tasks demanding improved safety and efficiency. Among these, quadruped robots with manipulator arms offer mobility and versatility for complex operations. However, teleoperating quadruped robots is challenging due to the lack of integrated obstacle detection and intuitive control methods for the robotic arm, increasing collision risks in confined or dynamically changing workspaces. Teleoperation via joysticks or pads can be non-intuitive and demands a high level of expertise due to its complexity, culminating in a high cognitive load on the operator. To address this challenge, a teleoperation approach that directly maps human arm movements to the robotic manipulator offers a simpler and more accessible solution. This work proposes an intuitive remote control by leveraging a vision-based pose estimation pipeline that utilizes an external camera with a machine learning-based model to detect the operator's wrist position. The system maps these wrist movements into robotic arm commands to control the robot's arm in real-time. A trajectory planner ensures safe teleoperation by detecting and preventing collisions with both obstacles and the robotic arm itself. The system was validated on the real robot, demonstrating robust performance in real-time control. This teleoperation approach provides a cost-effective solution for industrial applications where safety, precision, and ease of use are paramount, ensuring reliable and intuitive robotic control in high-risk environments.

A Vision-Based Shared-Control Teleoperation Scheme for Controlling the Robotic Arm of a Four-Legged Robot

TL;DR

This work tackles the challenge of teleoperating a quadruped robot with a manipulation arm in hazardous environments by introducing a vision-based, shared-control interface that maps the operator's wrist pose and hand orientation to the robot end-effector. It combines depth-camera perception, ArUco-based calibration, MediaPipe tracking, and gesture recognition to enable manual and semi-autonomous grasping modes, while a collision-aware planner ensures safe operation. The approach is implemented on a Boston Dynamics Spot platform and validated through both simulation and real-time experiments, achieving a mean wrist-to-end-effector error of approximately 0.07 m and successful pick-and-place tasks by two users. The results suggest a practical, low-cost alternative for high-risk industrial applications, with future work focusing on path planning and enhanced shared-control strategies.

Abstract

In hazardous and remote environments, robotic systems perform critical tasks demanding improved safety and efficiency. Among these, quadruped robots with manipulator arms offer mobility and versatility for complex operations. However, teleoperating quadruped robots is challenging due to the lack of integrated obstacle detection and intuitive control methods for the robotic arm, increasing collision risks in confined or dynamically changing workspaces. Teleoperation via joysticks or pads can be non-intuitive and demands a high level of expertise due to its complexity, culminating in a high cognitive load on the operator. To address this challenge, a teleoperation approach that directly maps human arm movements to the robotic manipulator offers a simpler and more accessible solution. This work proposes an intuitive remote control by leveraging a vision-based pose estimation pipeline that utilizes an external camera with a machine learning-based model to detect the operator's wrist position. The system maps these wrist movements into robotic arm commands to control the robot's arm in real-time. A trajectory planner ensures safe teleoperation by detecting and preventing collisions with both obstacles and the robotic arm itself. The system was validated on the real robot, demonstrating robust performance in real-time control. This teleoperation approach provides a cost-effective solution for industrial applications where safety, precision, and ease of use are paramount, ensuring reliable and intuitive robotic control in high-risk environments.

Paper Structure

This paper contains 17 sections, 5 equations, 6 figures.

Figures (6)

  • Figure 1: Experimental setup employed for testing and validating the proposed framework. The position $P_\text{wrist}$ (shown in the figure over the user's wrist) is captured by the camera, located at $P_\text{camera}$ (shown in the figure over the camera), which is employed to control the robot's end-effector position $P_\text{robot}$(shown in the figure over the robot's end-effector). The goal of the experiment is to grasp the object located at $P_\text{object}$ (shown in the figure over the everyday object) and store it at the place goal.
  • Figure 2: Overview of the proposed control pipeline, illustrating the flow from motion capture to robot actuation. A manual teleoperation module allows the user to control the robot in an intuitive manner using an external camera. A semi-autonomous module, capable of autonomously grasping an object, can be triggered by the user by closing their hands. This framework was employed to perform a pick-and-place task using different objects.
  • Figure 3: MediaPipe 3D landmarks numbered in order of the output. The landmarks were employed to track the user's hand in order to perform intuitive teleoperation of the robot's arm.
  • Figure 4: Overview of the teleoperation framework using a robotic arm. In a), the operator raises their index finger to the camera, activating manual control mode. b) shows the robot aligning its arm with the estimated wrist position of the user. In c), the arm approaches the target object. The grasp is executed in d) through a closed hand gesture. In e), the operator performs an open hand gesture to initiate release, which results in the gripper opening and the object being dropped in f).
  • Figure 5: Decoded position of the user's wrist using the pose estimation of the wrist and using the Optitrack system, and the translated robot's end-effector position captured using the Optitrack system. These measurements were used to calculate the positional error, using the Optitrack's values as ground-truth.
  • ...and 1 more figures