Table of Contents
Fetching ...

SMAL-pets: SMAL Based Avatars of Pets from Single Image

Piotr Borycki, Joanna Waczyńska, Yizhe Zhu, Yongqiang Gao, Przemysław Spurek

Abstract

Creating high-fidelity, animatable 3D dog avatars remains a formidable challenge in computer vision. Unlike human digital doubles, animal reconstruction faces a critical shortage of large-scale, annotated datasets for specialized applications. Furthermore, the immense morphological diversity across species, breeds, and crosses, which varies significantly in size, proportions, and features, complicates the generalization of existing models. Current reconstruction methods often struggle to capture realistic fur textures. Additionally, ensuring these avatars are fully editable and capable of performing complex, naturalistic movements typically necessitates labor-intensive manual mesh manipulation and expert rigging. This paper introduces SMAL-pets, a comprehensive framework that generates high-quality, editable animal avatars from a single input image. Our approach bridges the gap between reconstruction and generative modeling by leveraging a hybrid architecture. Our method integrates 3D Gaussian Splatting with the SMAL parametric model to provide a representation that is both visually high-fidelity and anatomically grounded. We introduce a multimodal editing suite that enables users to refine the avatar's appearance and execute complex animations through direct textual prompts. By allowing users to control both the aesthetic and behavioral aspects of the model via natural language, SMAL-pets provides a flexible, robust tool for animation and virtual reality.

SMAL-pets: SMAL Based Avatars of Pets from Single Image

Abstract

Creating high-fidelity, animatable 3D dog avatars remains a formidable challenge in computer vision. Unlike human digital doubles, animal reconstruction faces a critical shortage of large-scale, annotated datasets for specialized applications. Furthermore, the immense morphological diversity across species, breeds, and crosses, which varies significantly in size, proportions, and features, complicates the generalization of existing models. Current reconstruction methods often struggle to capture realistic fur textures. Additionally, ensuring these avatars are fully editable and capable of performing complex, naturalistic movements typically necessitates labor-intensive manual mesh manipulation and expert rigging. This paper introduces SMAL-pets, a comprehensive framework that generates high-quality, editable animal avatars from a single input image. Our approach bridges the gap between reconstruction and generative modeling by leveraging a hybrid architecture. Our method integrates 3D Gaussian Splatting with the SMAL parametric model to provide a representation that is both visually high-fidelity and anatomically grounded. We introduce a multimodal editing suite that enables users to refine the avatar's appearance and execute complex animations through direct textual prompts. By allowing users to control both the aesthetic and behavioral aspects of the model via natural language, SMAL-pets provides a flexible, robust tool for animation and virtual reality.
Paper Structure (10 sections, 11 equations, 15 figures, 5 tables)

This paper contains 10 sections, 11 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: We present a novel framework SMAL-pets for generating a fully animatable dog avatar from a single image. Our method achieves state-of-the-art performance by integrating a parametric SMAL mesh with Gaussians primitives. The parametric mesh provides explicit structural control, enabling pose manipulation through interpretable position parameters. We show that our approach supports text-driven avatar animation, enabling users to modify motion and appearance via natural-language prompts.
  • Figure 2: Our model SMAL-pets generate a dog avatar from a single input image by combining Gaussian primitives with a parametric SMAL mesh model. First, dense multi-view dataset is synthesised, which is considered pseudo ground truth. Stage 1: Gaussian primitives are bound to the faces of the SMAL model. Both the Gaussian and SMAL parameters are jointly optimized. Stage 2: Gaussians are allowed to detach from their associated faces, adaptive density control alongside it is used to capture more details like fur. During this stage, the mesh itself is only slightly refined, while the primary improvements occur in the Gaussian representation.
  • Figure 3: To construct a synthetic dataset representing pseudo ground truth, we apply our method to firstly generated 3D assets by three state-of-the-art image-to-3D approaches: TripoSG, SAM3D, and Trellis. Since our model uses Gaussian primitives, we can faithfully reproduce the geometry and appearance of these generated assets while maintaining structural consistency. Furthermore, by leveraging Gaussian editing-based methods like DGE, dog avatar can be improved in particularity quality of high-frequency details, mitigating the common "plastic" appearance observed in firstly generated assets.
  • Figure 4: Visual comparison with state-of-the-art method DogRecon for single-image dog avatar reconstruction. Both approaches represent the avatar using 3D Gaussians combined with the parametric SMAL model. Our method shows improved details. Additionally, the SMAL representation enables animation through direct manipulation of parameters.
  • Figure 5: Visual comparison with DogRecon from three viewpoints. Our method preserves more details appearance, resulting in a more realistic rendering.
  • ...and 10 more figures