Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images

Ci Li; Yi Yang; Zehang Weng; Elin Hernlund; Silvia Zuffi; Hedvig Kjellström

Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images

Ci Li, Yi Yang, Zehang Weng, Elin Hernlund, Silvia Zuffi, Hedvig Kjellström

TL;DR

The first method using synthetic data generation and disentanglement to learn to regress 3D shape and pose is introduced, focusing on horses, which surpasses existing 3D horse reconstruction methods and generalizes to other large animals like zebras, cows, and deer.

Abstract

In recent years, 3D parametric animal models have been developed to aid in estimating 3D shape and pose from images and video. While progress has been made for humans, it's more challenging for animals due to limited annotated data. To address this, we introduce the first method using synthetic data generation and disentanglement to learn to regress 3D shape and pose. Focusing on horses, we use text-based texture generation and a synthetic data pipeline to create varied shapes, poses, and appearances, learning disentangled spaces. Our method, Dessie, surpasses existing 3D horse reconstruction methods and generalizes to other large animals like zebras, cows, and deer. See the project website at: \url{https://celiali.github.io/Dessie/}.

Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images

TL;DR

Abstract

Paper Structure (24 sections, 5 equations, 8 figures, 3 tables)

This paper contains 24 sections, 5 equations, 8 figures, 3 tables.

Introduction
Related Work
Model-based methods for 3D reconstruction of humans and animals.
Feature Extraction Networks.
Disentanglement.
Method
The hSMAL model
DessiePIPE
Network Architecture
Loss
Experiments
Datasets
Synthetic Dataset.
Real-world Dataset.
Network Implementation Details
...and 9 more sections

Figures (8)

Figure 1: We estimate 3D shape and pose of horses from monocular images. The figure shows pictures of Dessie, a famous racehorse, together with our 3D reconstruction.
Figure 2: DessiePIPE: Synthetic generation pipeline. Top: the training image generation process; bottom: generated data samples with controlled variations.
Figure 3: Network architectures of Dessie. Dessie extracts latent features ($\gamma_A$, $\gamma_P$, $\gamma_G$) with DINO and predicts the hSMAL parameters with the corresponding decoders. The model is trained with keypoint loss, silhouette loss and a contrastive loss to encourage the disentanglement.
Figure 4: Visualization of the leading PCA components of key features for two synthetic and two real images from top to down. For each image, we visualize the original DINO (row 1) and the Dessie key features (row 2).
Figure 5: DinoHMR and Dessie before and after$^{\star}$ real-world data fine-tuning.
...and 3 more figures

Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images

TL;DR

Abstract

Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images

Authors

TL;DR

Abstract

Table of Contents

Figures (8)