Table of Contents
Fetching ...

Robo-GS: A Physics Consistent Spatial-Temporal Model for Robotic Arm with Hybrid Representation

Haozhe Lou, Yurong Liu, Yike Pan, Yiran Geng, Jianteng Chen, Wenlong Ma, Chenglong Li, Lin Wang, Hengzhen Feng, Lu Shi, Liyi Luo, Yongliang Shi

TL;DR

The paper tackles Real2Sim2Real gaps in robotic arm manipulation by introducing a hybrid representation that fuses mesh geometry, Gaussian primitives, and physics attributes through a Gaussian-Mesh-Pixel binding. This binding enables a differentiable pipeline where real video, simulation, and rendering share a common spatiotemporal representation, supported by URDF-based kinematics and Newton-Euler dynamics. Key contributions include a unified asset representation, mesh extraction and alignment techniques, physics-aware forward and dynamic equations, and a comprehensive dataset proposal for end-to-end policy training. Experimental results demonstrate improved mesh quality, high-fidelity rendering, and manipulable models capable of Sim2Real and novel-policy editing, with potential to enhance real-world robotic control and learning. The approach advances the state-of-the-art in physics-consistent digital twins and enables more reliable policy transfer and vision-based manipulation.

Abstract

Real2Sim2Real plays a critical role in robotic arm control and reinforcement learning, yet bridging this gap remains a significant challenge due to the complex physical properties of robots and the objects they manipulate. Existing methods lack a comprehensive solution to accurately reconstruct real-world objects with spatial representations and their associated physics attributes. We propose a Real2Sim pipeline with a hybrid representation model that integrates mesh geometry, 3D Gaussian kernels, and physics attributes to enhance the digital asset representation of robotic arms. This hybrid representation is implemented through a Gaussian-Mesh-Pixel binding technique, which establishes an isomorphic mapping between mesh vertices and Gaussian models. This enables a fully differentiable rendering pipeline that can be optimized through numerical solvers, achieves high-fidelity rendering via Gaussian Splatting, and facilitates physically plausible simulation of the robotic arm's interaction with its environment using mesh-based methods. The code,full presentation and datasets will be made publicly available at our website https://robostudioapp.com

Robo-GS: A Physics Consistent Spatial-Temporal Model for Robotic Arm with Hybrid Representation

TL;DR

The paper tackles Real2Sim2Real gaps in robotic arm manipulation by introducing a hybrid representation that fuses mesh geometry, Gaussian primitives, and physics attributes through a Gaussian-Mesh-Pixel binding. This binding enables a differentiable pipeline where real video, simulation, and rendering share a common spatiotemporal representation, supported by URDF-based kinematics and Newton-Euler dynamics. Key contributions include a unified asset representation, mesh extraction and alignment techniques, physics-aware forward and dynamic equations, and a comprehensive dataset proposal for end-to-end policy training. Experimental results demonstrate improved mesh quality, high-fidelity rendering, and manipulable models capable of Sim2Real and novel-policy editing, with potential to enhance real-world robotic control and learning. The approach advances the state-of-the-art in physics-consistent digital twins and enables more reliable policy transfer and vision-based manipulation.

Abstract

Real2Sim2Real plays a critical role in robotic arm control and reinforcement learning, yet bridging this gap remains a significant challenge due to the complex physical properties of robots and the objects they manipulate. Existing methods lack a comprehensive solution to accurately reconstruct real-world objects with spatial representations and their associated physics attributes. We propose a Real2Sim pipeline with a hybrid representation model that integrates mesh geometry, 3D Gaussian kernels, and physics attributes to enhance the digital asset representation of robotic arms. This hybrid representation is implemented through a Gaussian-Mesh-Pixel binding technique, which establishes an isomorphic mapping between mesh vertices and Gaussian models. This enables a fully differentiable rendering pipeline that can be optimized through numerical solvers, achieves high-fidelity rendering via Gaussian Splatting, and facilitates physically plausible simulation of the robotic arm's interaction with its environment using mesh-based methods. The code,full presentation and datasets will be made publicly available at our website https://robostudioapp.com
Paper Structure (13 sections, 11 equations, 7 figures, 1 table)

This paper contains 13 sections, 11 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: The reconstructed digital assets include the extracted mesh, Gaussian Models c1, a dynamically consistent kinematic model.
  • Figure 2: (Left) Our method converts monocular video to 3D meshes and Gaussians, URDF models, and generates trajectories through Gaussian-Mesh-Pixel binding, and (Right) renders dynamic interactions using forward deformation based on updated object and robotic arm trajectories.
  • Figure 3: Gaussian-Mesh-Pixel binding: This binding preserves the structural properties between Gaussian $G$, mesh $(V, F)$, and pixel location $P$ under affine transformation, which is a composition (right) a Gaussian-Mesh mapping (top-left), and a projective Pixel-Gaussian binding (top-right).
  • Figure 4: Comparison between deformable Gaussian Splatting (left) and our solution. (Left) In general deformation-based Gaussian Splatting c18c10, the dynamics of the scene are embedded in each Gaussian primitive, resulting in a high-dimensional degree of freedom (DoF). This causes failure during the training process in complex dynamic scenes, as observed in our setting. (Right) Our approach decomposes the motion into a small number of linkages and objects driven by governing equations, significantly reducing the degrees of freedom. This allows numerical solvers to optimize more efficiently.
  • Figure 5: Qualitative result for Novel-pose
  • ...and 2 more figures