Table of Contents
Fetching ...

Unsupervised Neural Motion Retargeting for Humanoid Teleoperation

Satoshi Yagi, Mitsunori Tada, Eiji Uchibe, Suguru Kanoga, Takamitsu Matsubara, Jun Morimoto

TL;DR

This work tackles the challenge of human-to-humanoid teleoperation by removing the need for paired training data and manual joint pre-specifications. It introduces a CycleGAN-based framework that learns a shared latent representation for human and humanoid motions, with separate posture and motion encoders and a set of losses that enforce reconstruction, latent consistency, adversarial realism, and end-effector velocity alignment. The approach enables real-time retargeting and is validated on upper-body motions, with end-effector errors competing with IK-based methods and demonstrated via a real pick-and-place task using a Torobo humanoid. The results show promising usability and robustness to operator variation, while also highlighting data requirements and limitations in end-effector orientation, suggesting directions for future improvements and broader deployment in teleoperation contexts.

Abstract

This study proposes an approach to human-to-humanoid teleoperation using GAN-based online motion retargeting, which obviates the need for the construction of pairwise datasets to identify the relationship between the human and the humanoid kinematics. Consequently, it can be anticipated that our proposed teleoperation system will reduce the complexity and setup requirements typically associated with humanoid controllers, thereby facilitating the development of more accessible and intuitive teleoperation systems for users without robotics knowledge. The experiments demonstrated the efficacy of the proposed method in retargeting a range of upper-body human motions to humanoid, including a body jab motion and a basketball shoot motion. Moreover, the human-in-the-loop teleoperation performance was evaluated by measuring the end-effector position errors between the human and the retargeted humanoid motions. The results demonstrated that the error was comparable to those of conventional motion retargeting methods that require pairwise motion datasets. Finally, a box pick-and-place task was conducted to demonstrate the usability of the developed humanoid teleoperation system.

Unsupervised Neural Motion Retargeting for Humanoid Teleoperation

TL;DR

This work tackles the challenge of human-to-humanoid teleoperation by removing the need for paired training data and manual joint pre-specifications. It introduces a CycleGAN-based framework that learns a shared latent representation for human and humanoid motions, with separate posture and motion encoders and a set of losses that enforce reconstruction, latent consistency, adversarial realism, and end-effector velocity alignment. The approach enables real-time retargeting and is validated on upper-body motions, with end-effector errors competing with IK-based methods and demonstrated via a real pick-and-place task using a Torobo humanoid. The results show promising usability and robustness to operator variation, while also highlighting data requirements and limitations in end-effector orientation, suggesting directions for future improvements and broader deployment in teleoperation contexts.

Abstract

This study proposes an approach to human-to-humanoid teleoperation using GAN-based online motion retargeting, which obviates the need for the construction of pairwise datasets to identify the relationship between the human and the humanoid kinematics. Consequently, it can be anticipated that our proposed teleoperation system will reduce the complexity and setup requirements typically associated with humanoid controllers, thereby facilitating the development of more accessible and intuitive teleoperation systems for users without robotics knowledge. The experiments demonstrated the efficacy of the proposed method in retargeting a range of upper-body human motions to humanoid, including a body jab motion and a basketball shoot motion. Moreover, the human-in-the-loop teleoperation performance was evaluated by measuring the end-effector position errors between the human and the retargeted humanoid motions. The results demonstrated that the error was comparable to those of conventional motion retargeting methods that require pairwise motion datasets. Finally, a box pick-and-place task was conducted to demonstrate the usability of the developed humanoid teleoperation system.
Paper Structure (14 sections, 6 equations, 10 figures, 2 tables)

This paper contains 14 sections, 6 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Operating a humanoid with the developed controller. Our teleoperation controller does not require paired data sets or common pre-specifications for its learning process to achieve human-to-humanoid motion retargeting.
  • Figure 2: Overview of Motion Retargeting: First, the human motion encoders pool two joints of each limb into one. Then, based on the deep motion representations articulated by the original skeleton, the humanoid motion decoders perform an unpooling operation, separating each joint back into two. Ultimately, this process enables the decoders to generate motions that accurately correspond to the actual humanoid skeletal structure.
  • Figure 3: Architecture of the network during training and teleoperation: We adopted the same network architecture as in the previous study aberman2020skeleton, which was designed for unpaired motion-to-motion translation between two different domains using a CycleGAN approach. While the previous study achieved motion translation between the same animated characters, this study focuses on motion translation from humans to humanoids, addressing different targets. (a) Training phase: During this phase, each generator translates motions from one domain to another, aiming to maintain consistency in motion translations across domains. Simultaneously, each discriminator discriminates between the original motions of its assigned domain and the redirected motions produced by the generator. (b) Teleoperation phase: In this phase, the network uses the humanoid motion generator developed during training. For motion retargeting, we input source human motion and target human posture information to adapt the humanoid's motions in real time.
  • Figure 4: Overview of the three-layer encoder architecture: This figure shows one of the three. Inputs (BVH files) are separated into posture (red) and motion (blue) components. Similar to aberman2020skeleton, these components are processed in parallel by the upper and lower networks, respectively. The upper network handles posture information and is connected to the lower network, which handles motion, allowing posture data to be integrated during motion encoding.
  • Figure 5: System configuration diagram of the humanoid controller. The motion capture system measures human joint angles $q_h$ at 50 Hz. The pre-trained network of the teleoperation PC outputs the desired Torobo joint angles $q_r$ to the controller PC at 25 Hz.
  • ...and 5 more figures