Table of Contents
Fetching ...

IKDiffuser: a Diffusion-based Generative Inverse Kinematics Solver for Kinematic Trees

Zeyu Zhang, Ziyuan Jiao

TL;DR

IKDiffuser addresses the challenging inverse kinematics problem for arbitrary kinematic trees by formulating IK as probabilistic diffusion with a structure-agnostic, token-based representation of end-effector goals. It learns a generative prior over joint configurations conditioned on end-effector poses and enables inference-time task guidance via objective-guided sampling, along with masked marginal inference to support partially specified goals. The framework supports task-specific objectives such as warm-start initialization and manipulability maximization without retraining, and it can seed optimization-based IK solvers to greatly boost success rates while delivering millisecond latency. Extensive experiments across eight robotic platforms demonstrate superior accuracy, diversity, and collision avoidance compared to baselines, and show dramatic improvements in seeding optimization-based solvers for high-DoF systems. The work offers a scalable, adaptable primitive for real-time planning and control in complex robots, enabling flexible task specifications without sacrificing precision.

Abstract

Solving Inverse Kinematics (IK) for arbitrary kinematic trees presents significant challenges due to their high-dimensionality, redundancy, and complex inter-branch constraints. Conventional optimization-based solvers can be sensitive to initialization and suffer from local minima or conflicting gradients. At the same time, existing learning-based approaches are often tied to a predefined number of end-effectors and a fixed training objective, limiting their reusability across various robot morphologies and task requirements. To address these limitations, we introduce IKDiffuser, a scalable IK solver built upon conditional diffusion-based generative models, which learns the distribution of the configuration space conditioned on end-effector poses. We propose a structure-agnostic formulation that represents end-effector poses as a sequence of tokens, leading to a unified framework that handles varying numbers of end-effectors while learning the implicit kinematic structures entirely from data. Beyond standard IK generation, IKDiffuser handles partially specified goals via a masked marginalization mechanism that conditions only on a subset of end-effector constraints. Furthermore, it supports adding task objectives at inference through objective-guided sampling, enabling capabilities such as warm-start initialization and manipulability maximization without retraining. Extensive evaluations across seven diverse robotic platforms demonstrate that IKDiffuser significantly outperforms state-of-the-art baselines in accuracy, solution diversity, and collision avoidance. Moreover, when used to initialize optimization-based solvers, IKDiffuser significantly boosts success rates on challenging redundant systems with high Degrees of Freedom (DoF), such as the 29-DoF Unitree G1 humanoid, from 21.01% to 96.96% while reducing computation time to the millisecond range.

IKDiffuser: a Diffusion-based Generative Inverse Kinematics Solver for Kinematic Trees

TL;DR

IKDiffuser addresses the challenging inverse kinematics problem for arbitrary kinematic trees by formulating IK as probabilistic diffusion with a structure-agnostic, token-based representation of end-effector goals. It learns a generative prior over joint configurations conditioned on end-effector poses and enables inference-time task guidance via objective-guided sampling, along with masked marginal inference to support partially specified goals. The framework supports task-specific objectives such as warm-start initialization and manipulability maximization without retraining, and it can seed optimization-based IK solvers to greatly boost success rates while delivering millisecond latency. Extensive experiments across eight robotic platforms demonstrate superior accuracy, diversity, and collision avoidance compared to baselines, and show dramatic improvements in seeding optimization-based solvers for high-DoF systems. The work offers a scalable, adaptable primitive for real-time planning and control in complex robots, enabling flexible task specifications without sacrificing precision.

Abstract

Solving Inverse Kinematics (IK) for arbitrary kinematic trees presents significant challenges due to their high-dimensionality, redundancy, and complex inter-branch constraints. Conventional optimization-based solvers can be sensitive to initialization and suffer from local minima or conflicting gradients. At the same time, existing learning-based approaches are often tied to a predefined number of end-effectors and a fixed training objective, limiting their reusability across various robot morphologies and task requirements. To address these limitations, we introduce IKDiffuser, a scalable IK solver built upon conditional diffusion-based generative models, which learns the distribution of the configuration space conditioned on end-effector poses. We propose a structure-agnostic formulation that represents end-effector poses as a sequence of tokens, leading to a unified framework that handles varying numbers of end-effectors while learning the implicit kinematic structures entirely from data. Beyond standard IK generation, IKDiffuser handles partially specified goals via a masked marginalization mechanism that conditions only on a subset of end-effector constraints. Furthermore, it supports adding task objectives at inference through objective-guided sampling, enabling capabilities such as warm-start initialization and manipulability maximization without retraining. Extensive evaluations across seven diverse robotic platforms demonstrate that IKDiffuser significantly outperforms state-of-the-art baselines in accuracy, solution diversity, and collision avoidance. Moreover, when used to initialize optimization-based solvers, IKDiffuser significantly boosts success rates on challenging redundant systems with high Degrees of Freedom (DoF), such as the 29-DoF Unitree G1 humanoid, from 21.01% to 96.96% while reducing computation time to the millisecond range.

Paper Structure

This paper contains 24 sections, 22 equations, 6 figures, 8 tables, 2 algorithms.

Figures (6)

  • Figure 1: Architecture of IKDiffuser. The model generates inverse kinematics solutions $\boldsymbol{q}^0$ by iteratively denoising Gaussian noise $\boldsymbol{q}^T$ over $T$ timesteps, conditioned on target end-effector poses $\mathcal{X}$. Each end-effector pose $\boldsymbol{x}_i$ is embedded with positional encoding (PE), while timestep $t$ is integrated with denoised configuration $\boldsymbol{q}^t$ through a Residual block. The Transformer block employs cross-attention to learn the relation between joint configurations and end-effector poses, with detailed block structures shown in the colored boxes on the right.
  • Figure 2: Task 2: Success rate versus optimization iterations, when seeding Optimization-based IK Solver (cuRobo) with IKDiffuser.
  • Figure 3: Task 3: Computational Time. Time-to-solution vs. batch size across eight robot platforms for cuRobo.
  • Figure 4: Task 3: Computational Time. Time-to-solution vs. batch size across eight robot platforms for Pink.
  • Figure 5: Task 5: Illustration of marginal inference results. Joints belonging to chains with unconstrained end-effectors remain fixed.
  • ...and 1 more figures