Efficient Learning of Fast Inverse Kinematics with Collision Avoidance
Johannes Tenhumberg, Arman Mielke, Berthold Bäuml
TL;DR
The paper addresses the challenge of computing collision-free inverse kinematics for high-DoF robots in arbitrary, sensor-derived environments. It combines an optimization-based IK framework with learning-based warm-starts, offering supervised and unsupervised training paths, and encodes environments with Basis Point Set. The key contributions are a fast solver achieving under 10 ms on a CPU for a 19-DoF humanoid, a twin-headed network with a singularity-free unit-vector output, and an unsupervised training regime that bypasses data generation while delivering comparable or better performance than supervised training. The results show large speedups and robust generalization to unseen 3D scenes, making real-time collision-free IK practical for real-world manipulation and grasping tasks.
Abstract
Fast inverse kinematics (IK) is a central component in robotic motion planning. For complex robots, IK methods are often based on root search and non-linear optimization algorithms. These algorithms can be massively sped up using a neural network to predict a good initial guess, which can then be refined in a few numerical iterations. Besides previous work on learning-based IK, we present a learning approach for the fundamentally more complex problem of IK with collision avoidance. We do this in diverse and previously unseen environments. From a detailed analysis of the IK learning problem, we derive a network and unsupervised learning architecture that removes the need for a sample data generation step. Using the trained network's prediction as an initial guess for a two-stage Jacobian-based solver allows for fast and accurate computation of the collision-free IK. For the humanoid robot, Agile Justin (19 DoF), the collision-free IK is solved in less than 10 milliseconds (on a single CPU core) and with an accuracy of 10^-4 m and 10^-3 rad based on a high-resolution world model generated from the robot's integrated 3D sensor. Our method massively outperforms a random multi-start baseline in a benchmark with the 19 DoF humanoid and challenging 3D environments. It requires ten times less training time than a supervised training method while achieving comparable results.
