PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction
Sinisa Stekovic, Arslan Artykov, Stefan Ainetter, Mattia D'Urso, Friedrich Fraundorfer
TL;DR
PyTorchGeoNodes presents a differentiable pipeline that converts Blender shape programs into gradient-friendly PyTorch graphs, enabling end-to-end optimization of both continuous and discrete shape parameters for 3D reconstruction from RGB-D data. A genetic algorithm drives discrete parameter estimation while gradient-based refinement optimizes continuous parameters, and the framework is extended with Gaussian splatting to capture fine details. The method achieves accurate object parameter recovery on real ScanNet scenes and demonstrates competitive performance against baselines, with demonstrated integration of procedural shapes and Gaussians. This work advances interpretable, compact, and editable 3D reconstruction by uniting procedural modeling with differentiable optimization and a Blender-to-PyTorch compiler.
Abstract
We propose PyTorchGeoNodes, a differentiable module for reconstructing 3D objects and their parameters from images using interpretable shape programs. Unlike traditional CAD model retrieval, shape programs allow reasoning about semantic parameters, editing, and a low memory footprint. Despite their potential, shape programs for 3D scene understanding have been largely overlooked. Our key contribution is enabling gradient-based optimization by parsing shape programs, or more precisely procedural models designed in Blender, into efficient PyTorch code. While there are many possible applications of our PyTochGeoNodes, we show that a combination of PyTorchGeoNodes with genetic algorithm is a method of choice to optimize both discrete and continuous shape program parameters for 3D reconstruction and understanding of 3D object parameters. Our modular framework can be further integrated with other reconstruction algorithms, and we demonstrate one such integration to enable procedural Gaussian splatting. Our experiments on the ScanNet dataset show that our method achieves accurate reconstructions while enabling, until now, unseen level of 3D scene understanding.
