Table of Contents
Fetching ...

From Human Hands to Robotic Limbs: A Study in Motor Skill Embodiment for Telemanipulation

Haoyi Shi, Mingxi Su, Ted Morris, Vassilios Morellas, Nikolaos Papanikolopoulos

TL;DR

This work addresses teleoperating a redundant $7$-DOF Kinova manipulator by learning a latent trajectory space with a GRU-based Variational Autoencoder and mapping human arm gestures into that space via a fully connected network. The system decodes latent trajectories in real time to produce corresponding robot joint configurations, enabling novel configurations beyond the training set. Key results show a mean end-effector error of $2.51$ cm and a cosine similarity of $0.97$ across tasks and participants, demonstrating accurate, generalizable teleoperation. The approach reduces data requirements for high-DOF control and offers a scalable, imitation-learning-ready framework for human-robot co-adaptation in teleoperation contexts.

Abstract

This paper presents a teleoperation system for controlling a redundant degree of freedom robot manipulator using human arm gestures. We propose a GRU-based Variational Autoencoder to learn a latent representation of the manipulator's configuration space, capturing its complex joint kinematics. A fully connected neural network maps human arm configurations into this latent space, allowing the system to mimic and generate corresponding manipulator trajectories in real time through the VAE decoder. The proposed method shows promising results in teleoperating the manipulator, enabling the generation of novel manipulator configurations from human features that were not present during training.

From Human Hands to Robotic Limbs: A Study in Motor Skill Embodiment for Telemanipulation

TL;DR

This work addresses teleoperating a redundant -DOF Kinova manipulator by learning a latent trajectory space with a GRU-based Variational Autoencoder and mapping human arm gestures into that space via a fully connected network. The system decodes latent trajectories in real time to produce corresponding robot joint configurations, enabling novel configurations beyond the training set. Key results show a mean end-effector error of cm and a cosine similarity of across tasks and participants, demonstrating accurate, generalizable teleoperation. The approach reduces data requirements for high-DOF control and offers a scalable, imitation-learning-ready framework for human-robot co-adaptation in teleoperation contexts.

Abstract

This paper presents a teleoperation system for controlling a redundant degree of freedom robot manipulator using human arm gestures. We propose a GRU-based Variational Autoencoder to learn a latent representation of the manipulator's configuration space, capturing its complex joint kinematics. A fully connected neural network maps human arm configurations into this latent space, allowing the system to mimic and generate corresponding manipulator trajectories in real time through the VAE decoder. The proposed method shows promising results in teleoperating the manipulator, enabling the generation of novel manipulator configurations from human features that were not present during training.

Paper Structure

This paper contains 14 sections, 5 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Orange labels, $\mathbf{\textit{J}_1\textit{-}\textit{J}_7}$, indicate the positions of the Kinova manipulator's joints. Green labels, $\mathbf{\textit{q}_1\textit{-}\textit{q}_7}$, indicate the positions of the human arm's joints. Arrow shows the mapping relationship between the manipulator's joint and the kinematic chain of a human upper limb.
  • Figure 2: The GRU-based VAE takes a 2-time-step manipulator joint angle position trajectory as input. It learns an approximate latent distribution by sampling latent features using the reparameterization trick, which is then passed to the decoder to reconstruct the input trajectory. The learned latent distribution space enables the approximation of the entire manipulator configuration space.
  • Figure 3: The fully-connected module consists of an input layer, and three hidden layers with neuron counts ($32$, $40$, $20$). The output size is $10$ (latent feature size) for final predictions.
  • Figure 4: This figure demonstrates the training result of GRU-based VAE with (Top) and without (Bottom) annealing scheduler. The $Z$ axis represents the Correlation-Coefficient score for each Latent feature $L_1$ to $L_{10}$ and Kinova robot joint $J_1$ to $J_7$.
  • Figure 5: The red trajectory: operator's hand position in Cartesian space. The green: proposed teleoperation pipeline. The blue: fully-connected network without the VAE decoder.
  • ...and 1 more figures