Table of Contents
Fetching ...

Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos

Subin Jeon, In Cho, Minsu Kim, Woong Oh Cho, Seon Joo Kim

TL;DR

This work introduces Hierarchically Structured Neural Bones (HSNB) to reconstruct animatable 3D models from casual videos by learning a tree-structured hierarchy of Gaussian ellipsoid bones. The key innovations are a hierarchical deformation model that captures coarse-to-fine motions and a bone occupancy regularization via a bone occupancy function to align bones with object surfaces, yielding interpretable control points and easier manipulation. The method extends BANMo with structured bones, enabling fewer control points, better reconstruction quality, and intuitive coarse-to-fine manipulation, including dynamic addition and deletion of bones. Experimental results across humans and animals show improved reconstruction and rendering quality, stronger manipulation capabilities, and clear interpretability of motion structure, with code available for reuse. The approach significantly lowers the barrier to obtaining and animating arbitrary objects from casual videos, while also acknowledging potential societal impacts such as privacy concerns and job disruption.

Abstract

We propose a new framework for creating and easily manipulating 3D models of arbitrary objects using casually captured videos. Our core ingredient is a novel hierarchy deformation model, which captures motions of objects with a tree-structured bones. Our hierarchy system decomposes motions based on the granularity and reveals the correlations between parts without exploiting any prior structural knowledge. We further propose to regularize the bones to be positioned at the basis of motions, centers of parts, sufficiently covering related surfaces of the part. This is achieved by our bone occupancy function, which identifies whether a given 3D point is placed within the bone. Coupling the proposed components, our framework offers several clear advantages: (1) users can obtain animatable 3D models of the arbitrary objects in improved quality from their casual videos, (2) users can manipulate 3D models in an intuitive manner with minimal costs, and (3) users can interactively add or delete control points as necessary. The experimental results demonstrate the efficacy of our framework on diverse instances, in reconstruction quality, interpretability and easier manipulation. Our code is available at https://github.com/subin6/HSNB.

Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos

TL;DR

This work introduces Hierarchically Structured Neural Bones (HSNB) to reconstruct animatable 3D models from casual videos by learning a tree-structured hierarchy of Gaussian ellipsoid bones. The key innovations are a hierarchical deformation model that captures coarse-to-fine motions and a bone occupancy regularization via a bone occupancy function to align bones with object surfaces, yielding interpretable control points and easier manipulation. The method extends BANMo with structured bones, enabling fewer control points, better reconstruction quality, and intuitive coarse-to-fine manipulation, including dynamic addition and deletion of bones. Experimental results across humans and animals show improved reconstruction and rendering quality, stronger manipulation capabilities, and clear interpretability of motion structure, with code available for reuse. The approach significantly lowers the barrier to obtaining and animating arbitrary objects from casual videos, while also acknowledging potential societal impacts such as privacy concerns and job disruption.

Abstract

We propose a new framework for creating and easily manipulating 3D models of arbitrary objects using casually captured videos. Our core ingredient is a novel hierarchy deformation model, which captures motions of objects with a tree-structured bones. Our hierarchy system decomposes motions based on the granularity and reveals the correlations between parts without exploiting any prior structural knowledge. We further propose to regularize the bones to be positioned at the basis of motions, centers of parts, sufficiently covering related surfaces of the part. This is achieved by our bone occupancy function, which identifies whether a given 3D point is placed within the bone. Coupling the proposed components, our framework offers several clear advantages: (1) users can obtain animatable 3D models of the arbitrary objects in improved quality from their casual videos, (2) users can manipulate 3D models in an intuitive manner with minimal costs, and (3) users can interactively add or delete control points as necessary. The experimental results demonstrate the efficacy of our framework on diverse instances, in reconstruction quality, interpretability and easier manipulation. Our code is available at https://github.com/subin6/HSNB.
Paper Structure (29 sections, 16 equations, 20 figures, 9 tables)

This paper contains 29 sections, 16 equations, 20 figures, 9 tables.

Figures (20)

  • Figure 1: We aim to reconstruct animatable models that can be manipulated in a coarse-to-fine manner, using multiple videos capturing a deformable object. The resulting 3D model can be manipulated using a hierarchical deformation model, where coarse motions are manipulated using the parent bones, and fine motions are subdivided by the child bones. We present manipulation results in novel poses.
  • Figure 2: (a) The overview of the proposed framework for creating 3D animatble models from videos. Each ray from the image pixel is deformed to the canonical space. Rays are deformed in a coarse-to-fine manner, using the hierarchical neural deformation model. (b) The process of hierarchical neural deformation model. Coarse motions and fine motions are composited through the bone hierarchy formulation.
  • Figure 3: Bone hierarchy diagram. Subordinate bones inherit the motion of all their parent bones (orange line). The leaf bones are used in calculating the skinning weights (blue line). Bones are gradually added during the optimization. After the optimization, users can add or delete the bones in desired regions.
  • Figure 4: Qualitative comparisons with template-free methods (ViSER, BANMo) and skeleton-based methods (CAMM, RAC). The 3D reconstruction results and the corresponding control points are described. We omit the eagle result for RAC as they require skeletons for reconstruction, which are not provided.
  • Figure 5: (a) Qualitative comparison on neural rendering results. (b) Qualitative comparison of the retargeted objects.
  • ...and 15 more figures