Learning High-Fidelity Robot Self-Model with Articulated 3D Gaussian Splatting
Kejun Hu, Peng Yu, Ning Tan
TL;DR
This work presents a texture-aware robot self-modeling framework based on 3D Gaussian Splatting (3DGS) that jointly models morphology, texture, and kinematics from multi-view RGB images. A static self-model is first constructed from rest-pose data, then neural ellipsoid bones and a kinematic network are learned to deform the Gaussian representation according to joint angles, enabling real-time rendering and control via Gaussian splatting. The approach achieves high geometric fidelity and texture visualization at the link level, outperforming depth-dependent methods while enabling downstream tasks such as motion planning and inverse kinematics. The method is validated in simulation and on a physical robot, demonstrating accurate rendering, mesh extraction, and practical applications like reaching, surface touching, obstacle avoidance, and IK estimation, with potential for broader robotic platforms and extensions to soft robots.
Abstract
Self-modeling enables robots to build task-agnostic models of their morphology and kinematics based on data that can be automatically collected, with minimal human intervention and prior information, thereby enhancing machine intelligence. Recent research has highlighted the potential of data-driven technology in modeling the morphology and kinematics of robots. However, existing self-modeling methods suffer from either low modeling quality or excessive data acquisition costs. Beyond morphology and kinematics, texture is also a crucial component of robots, which is challenging to model and remains unexplored. In this work, a high-quality, texture-aware, and link-level method is proposed for robot self-modeling. We utilize three-dimensional (3D) Gaussians to represent the static morphology and texture of robots, and cluster the 3D Gaussians to construct neural ellipsoid bones, whose deformations are controlled by the transformation matrices generated by a kinematic neural network. The 3D Gaussians and kinematic neural network are trained using data pairs composed of joint angles, camera parameters and multi-view images without depth information. By feeding the kinematic neural network with joint angles, we can utilize the well-trained model to describe the corresponding morphology, kinematics and texture of robots at the link level, and render robot images from different perspectives with the aid of 3D Gaussian splatting. Furthermore, we demonstrate that the established model can be exploited to perform downstream tasks such as motion planning and inverse kinematics.
