InstantGeoAvatar: Effective Geometry and Appearance Modeling of Animatable Avatars from Monocular Video
Alvaro Budria, Adrian Lopez-Rodriguez, Oscar Lorente, Francesc Moreno-Noguer
TL;DR
InstantGeoAvatar tackles the problem of reconstructing and animating detailed 3D clothed human avatars from monocular RGB video with interactive speed. It introduces a canonical signed distance field $f_{sdf}$ and a texture field $f_{rgb}$ parameterized on a multiresolution hash grid, regulated by a geometry-aware surface term $\mathcal{L}_{smooth}$ integrated into differentiable volume rendering to stabilize hash-grid optimization. Training combines photometric loss, mask loss, Eikonal loss, and the proposed smoothing loss, yielding fast and robust optimization that delivers competitive geometry and novel-view synthesis in as little as $5$–$10$ minutes. The approach enables interactive reconstruction of virtual avatars with improved surface coherence, watertight meshes, and efficient rendering suitable for AR/VR workflows.
Abstract
We present InstantGeoAvatar, a method for efficient and effective learning from monocular video of detailed 3D geometry and appearance of animatable implicit human avatars. Our key observation is that the optimization of a hash grid encoding to represent a signed distance function (SDF) of the human subject is fraught with instabilities and bad local minima. We thus propose a principled geometry-aware SDF regularization scheme that seamlessly fits into the volume rendering pipeline and adds negligible computational overhead. Our regularization scheme significantly outperforms previous approaches for training SDFs on hash grids. We obtain competitive results in geometry reconstruction and novel view synthesis in as little as five minutes of training time, a significant reduction from the several hours required by previous work. InstantGeoAvatar represents a significant leap forward towards achieving interactive reconstruction of virtual avatars.
