Table of Contents
Fetching ...

TIFu: Tri-directional Implicit Function for High-Fidelity 3D Character Reconstruction

Byoungsung Lim, Seong-Whan Lee

TL;DR

The paper addresses single-image 3D character reconstruction for highly variable animated characters by introducing Tri-directional Implicit Function (TIFu), a memory-efficient vector-level occupancy representation that infers 3D shape along three orthogonal axes. It combines coarse-to-fine vector inference with high-resolution surface-normal cues and an adaptive BCE loss to reduce depth-ambiguity and improve global consistency. The method achieves state-of-the-art performance on Mixamo and THuman2.0 while producing coherent, detailed meshes that generalize to in-the-wild images. This approach has strong practical impact for animation, gaming, and VR by enabling high-fidelity character reconstruction from a single image with manageable memory usage.

Abstract

Recent advances in implicit function-based approaches have shown promising results in 3D human reconstruction from a single RGB image. However, these methods are not sufficient to extend to more general cases, often generating dragged or disconnected body parts, particularly for animated characters. We argue that these limitations stem from the use of the existing point-level 3D shape representation, which lacks holistic 3D context understanding. Voxel-based reconstruction methods are more suitable for capturing the entire 3D space at once, however, these methods are not practical for high-resolution reconstructions due to their excessive memory usage. To address these challenges, we introduce Tri-directional Implicit Function (TIFu), which is a vector-level representation that increases global 3D consistencies while significantly reducing memory usage compared to voxel representations. We also introduce a new algorithm in 3D reconstruction at an arbitrary resolution by aggregating vectors along three orthogonal axes, resolving inherent problems with regressing fixed dimension of vectors. Our approach achieves state-of-the-art performances in both our self-curated character dataset and the benchmark 3D human dataset. We provide both quantitative and qualitative analyses to support our findings.

TIFu: Tri-directional Implicit Function for High-Fidelity 3D Character Reconstruction

TL;DR

The paper addresses single-image 3D character reconstruction for highly variable animated characters by introducing Tri-directional Implicit Function (TIFu), a memory-efficient vector-level occupancy representation that infers 3D shape along three orthogonal axes. It combines coarse-to-fine vector inference with high-resolution surface-normal cues and an adaptive BCE loss to reduce depth-ambiguity and improve global consistency. The method achieves state-of-the-art performance on Mixamo and THuman2.0 while producing coherent, detailed meshes that generalize to in-the-wild images. This approach has strong practical impact for animation, gaming, and VR by enabling high-fidelity character reconstruction from a single image with manageable memory usage.

Abstract

Recent advances in implicit function-based approaches have shown promising results in 3D human reconstruction from a single RGB image. However, these methods are not sufficient to extend to more general cases, often generating dragged or disconnected body parts, particularly for animated characters. We argue that these limitations stem from the use of the existing point-level 3D shape representation, which lacks holistic 3D context understanding. Voxel-based reconstruction methods are more suitable for capturing the entire 3D space at once, however, these methods are not practical for high-resolution reconstructions due to their excessive memory usage. To address these challenges, we introduce Tri-directional Implicit Function (TIFu), which is a vector-level representation that increases global 3D consistencies while significantly reducing memory usage compared to voxel representations. We also introduce a new algorithm in 3D reconstruction at an arbitrary resolution by aggregating vectors along three orthogonal axes, resolving inherent problems with regressing fixed dimension of vectors. Our approach achieves state-of-the-art performances in both our self-curated character dataset and the benchmark 3D human dataset. We provide both quantitative and qualitative analyses to support our findings.
Paper Structure (12 sections, 9 equations, 6 figures, 3 tables)

This paper contains 12 sections, 9 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Qualitative results from template-based methods for character reconstruction. Template-based methods R10E20 fall short when it comes to reconstructing characters with significant geometrical differences from humans.
  • Figure 2: Overview of our 3D mesh reconstruction. Our approach involves constructing a 3D space using vectors along three orthogonal axes in a coarse-to-fine manner. Coarse-level module estimates vector-level 3D representations along tri-directional rays based on a given query point. We then refine our coarse-level vectors along depth by attending to high-resolution visual cues. The final 3D volume is constructed by stacking densely acquired vectors and aggregating the resulting three separate volumes. We obtain the 3D mesh by applying Marching Cubes to the final 3D volume.
  • Figure 3: Qualitative comparison. We present results from the Mixamo dataset with varying subjects and dynamic poses.
  • Figure 4: Reconstructed results from a benchmark human dataset. We show high-quality results from THuman2.0. Our results outperform prior works in capturing facial features and clothing folds with natural depictions of poses.
  • Figure 5: Qualitative results on ablation study. TIFu shows its advantage in reducing feature ambiguity along unseen directions. Our adaptive losses effectively mitigate the class imbalance problem in inferring mostly empty 3D space.
  • ...and 1 more figures