Table of Contents
Fetching ...

HL-IK: A Lightweight Implementation of Human-Like Inverse Kinematics in Humanoid Arms

Bingjie Chen, Zihan Wang, Zhe Han, Guoping Pan, Yi Cheng, Houde Liu

TL;DR

Human-Like Inverse Kinematics (HL-IK), a lightweight IK framework that preserves EE tracking while shaping whole-arm configurations to appear human-like, without full-body sensing at runtime, is presented.

Abstract

Traditional IK methods for redundant humanoid manipulators emphasize end-effector (EE) tracking, frequently producing configurations that are valid mechanically but not human-like. We present Human-Like Inverse Kinematics (HL-IK), a lightweight IK framework that preserves EE tracking while shaping whole-arm configurations to appear human-like, without full-body sensing at runtime. The key idea is a learned elbow prior: using large-scale human motion data retargeted to the robot, we train a FiLM-modulated spatio-temporal attention network (FiSTA) to predict the next-step elbow pose from the EE target and a short history of EE-elbow states.This prediction is incorporated as a small residual alongside EE and smoothness terms in a standard Levenberg-Marquardt optimizer, making HL-IK a drop-in addition to numerical IK stacks. Over 183k simulation steps, HL-IK reduces arm-similarity position and direction error by 30.6% and 35.4% on average, and by 42.2% and 47.4% on the most challenging trajectories. Hardware teleoperation on a robot distinct from simulation further confirms the gains in anthropomorphism. HL-IK is simple to integrate, adaptable across platforms via our pipeline, and adds minimal computation, enabling human-like motions for humanoid robots.

HL-IK: A Lightweight Implementation of Human-Like Inverse Kinematics in Humanoid Arms

TL;DR

Human-Like Inverse Kinematics (HL-IK), a lightweight IK framework that preserves EE tracking while shaping whole-arm configurations to appear human-like, without full-body sensing at runtime, is presented.

Abstract

Traditional IK methods for redundant humanoid manipulators emphasize end-effector (EE) tracking, frequently producing configurations that are valid mechanically but not human-like. We present Human-Like Inverse Kinematics (HL-IK), a lightweight IK framework that preserves EE tracking while shaping whole-arm configurations to appear human-like, without full-body sensing at runtime. The key idea is a learned elbow prior: using large-scale human motion data retargeted to the robot, we train a FiLM-modulated spatio-temporal attention network (FiSTA) to predict the next-step elbow pose from the EE target and a short history of EE-elbow states.This prediction is incorporated as a small residual alongside EE and smoothness terms in a standard Levenberg-Marquardt optimizer, making HL-IK a drop-in addition to numerical IK stacks. Over 183k simulation steps, HL-IK reduces arm-similarity position and direction error by 30.6% and 35.4% on average, and by 42.2% and 47.4% on the most challenging trajectories. Hardware teleoperation on a robot distinct from simulation further confirms the gains in anthropomorphism. HL-IK is simple to integrate, adaptable across platforms via our pipeline, and adds minimal computation, enabling human-like motions for humanoid robots.

Paper Structure

This paper contains 29 sections, 10 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: In physical teleoperation, when only the EE pose is used as input, our HL-IK method achieves a more human-like effect, with the arm configuration more closely resembling that of humans.
  • Figure 2: Our method begins with a large-scale human motion dataset, which we retarget to the robot to extract an EE–elbow mapping dataset used to train our FiSTA network. After training, during human teleoperation we obtain the operator’s desired current EE pose via a VR. Given this target and a fixed-length history of past frames, FiSTA predicts the elbow position. We then augment the IK objective with an elbow-alignment cost and solve for the desired joint angles using Levenberg–Marquardt iterations. The resulting commands are sent to the robot’s low-level controller.
  • Figure 3: EE-Elbow Data Collection. For each frame, extract the relative pose of the EE and the elbow in the corresponding shoulder frames.
  • Figure 4: Model Architecture. A GRU encodes the 5-frame history to produce temporal features, which are FiLM-conditioned using the next EE target. In parallel, an attention module computes spatial features from the last frame. The two streams are then concatenated to predict the elbow configuration.
  • Figure 5: Model Evaluation Comparison. The y-axis shows validation MSE over epochs. The red star marks the lowest loss of FiSTA. All models use a 5-frame history, with other hyperparameters set to their best configuration.
  • ...and 4 more figures