Table of Contents
Fetching ...

Learning Adaptive Neural Teleoperation for Humanoid Robots: From Inverse Kinematics to End-to-End Control

Sanjar Atamuradov

TL;DR

This paper tackles the limitations of traditional VR teleoperation for humanoid robots, where IK+PD pipelines struggle with force disturbances, motion artifacts, and user-specific adaptation. It proposes an end-to-end neural teleoperation framework that directly maps VR controller poses and robot proprioception to joint commands, using a VR encoder, a proprioception encoder, and an LSTM head to ensure temporal coherence. Training proceeds in three stages—imitation from IK demonstrations, RL fine-tuning with smoothness and tracking rewards, and a force adaptation curriculum—followed by sim-to-real transfer via domain randomization and asymmetric critics; the system runs at 50 Hz with real-time performance on the Unitree G1. Empirical results show 34% lower tracking error, 45% smoother motions, and high user preference (87%), with successful sim-to-real transfer and robust force adaptation across manipulation tasks, highlighting the potential of learned teleoperation for natural, robust human-robot collaboration.

Abstract

Virtual reality (VR) teleoperation has emerged as a promising approach for controlling humanoid robots in complex manipulation tasks. However, traditional teleoperation systems rely on inverse kinematics (IK) solvers and hand-tuned PD controllers, which struggle to handle external forces, adapt to different users, and produce natural motions under dynamic conditions. In this work, we propose a learning-based neural teleoperation framework that replaces the conventional IK+PD pipeline with learned policies trained via reinforcement learning. Our approach learns to directly map VR controller inputs to robot joint commands while implicitly handling force disturbances, producing smooth trajectories, and adapting to user preferences. We train our policies in simulation using demonstrations collected from IK-based teleoperation as initialization, then fine-tune them with force randomization and trajectory smoothness rewards. Experiments on the Unitree G1 humanoid robot demonstrate that our learned policies achieve 34% lower tracking error, 45% smoother motions, and superior force adaptation compared to the IK baseline, while maintaining real-time performance (50Hz control frequency). We validate our approach on manipulation tasks including object pick-and-place, door opening, and bimanual coordination. These results suggest that learning-based approaches can significantly improve the naturalness and robustness of humanoid teleoperation systems.

Learning Adaptive Neural Teleoperation for Humanoid Robots: From Inverse Kinematics to End-to-End Control

TL;DR

This paper tackles the limitations of traditional VR teleoperation for humanoid robots, where IK+PD pipelines struggle with force disturbances, motion artifacts, and user-specific adaptation. It proposes an end-to-end neural teleoperation framework that directly maps VR controller poses and robot proprioception to joint commands, using a VR encoder, a proprioception encoder, and an LSTM head to ensure temporal coherence. Training proceeds in three stages—imitation from IK demonstrations, RL fine-tuning with smoothness and tracking rewards, and a force adaptation curriculum—followed by sim-to-real transfer via domain randomization and asymmetric critics; the system runs at 50 Hz with real-time performance on the Unitree G1. Empirical results show 34% lower tracking error, 45% smoother motions, and high user preference (87%), with successful sim-to-real transfer and robust force adaptation across manipulation tasks, highlighting the potential of learned teleoperation for natural, robust human-robot collaboration.

Abstract

Virtual reality (VR) teleoperation has emerged as a promising approach for controlling humanoid robots in complex manipulation tasks. However, traditional teleoperation systems rely on inverse kinematics (IK) solvers and hand-tuned PD controllers, which struggle to handle external forces, adapt to different users, and produce natural motions under dynamic conditions. In this work, we propose a learning-based neural teleoperation framework that replaces the conventional IK+PD pipeline with learned policies trained via reinforcement learning. Our approach learns to directly map VR controller inputs to robot joint commands while implicitly handling force disturbances, producing smooth trajectories, and adapting to user preferences. We train our policies in simulation using demonstrations collected from IK-based teleoperation as initialization, then fine-tune them with force randomization and trajectory smoothness rewards. Experiments on the Unitree G1 humanoid robot demonstrate that our learned policies achieve 34% lower tracking error, 45% smoother motions, and superior force adaptation compared to the IK baseline, while maintaining real-time performance (50Hz control frequency). We validate our approach on manipulation tasks including object pick-and-place, door opening, and bimanual coordination. These results suggest that learning-based approaches can significantly improve the naturalness and robustness of humanoid teleoperation systems.

Paper Structure

This paper contains 36 sections, 15 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Neural teleoperation policy architecture. The network takes VR controller poses (14-dim), joint states (28-dim), and task context (6-dim) as inputs. A proprioception encoder processes robot state, followed by two hidden layers (512 and 256 units). Outputs include joint position targets and feedforward torques (14-dim each). Joint state feedback enables force adaptation through proprioceptive history. The policy is trained end-to-end using PPO with force curriculum.
  • Figure 2: End-effector tracking error during training. The learned policy (blue) improves significantly over 5000 training iterations, achieving 34% lower error than the constant IK+PD baseline (purple).
  • Figure 3: Task-specific performance comparison. Left: Success rates showing learned policy achieves 90.7% average success vs 70.7% for IK+PD. Right: Tracking errors demonstrating 2.2cm average for learned vs 4.7cm for IK+PD baseline.
  • Figure 4: End-effector trajectory smoothness comparison over 1-second motion. The learned policy (blue solid) exhibits smoother position, velocity, and acceleration profiles compared to the jerky IK+PD baseline (purple dashed). Lower acceleration standard deviation indicates reduced jerk.
  • Figure 5: Task success rate under increasing external force disturbances. The learned policy with force curriculum (blue) maintains 87% success at 30N forces, compared to 31% for IK+PD baseline (purple) and 79% for learned without curriculum (orange). Typical human interaction forces are around 15N.