Learning the 3D Fauna of the Web
Zizhang Li, Dor Litvak, Ruining Li, Yunzhi Zhang, Tomas Jakab, Christian Rupprecht, Shangzhe Wu, Andrea Vedaldi, Jiajun Wu
TL;DR
3D-Fauna tackles the ambitious task of learning a pan-category deformable 3D model for over 100 quadruped species using only 2D Internet images. It introduces the Semantic Bank of Skinned Models (SBSM), which automatically discovers a compact set of base shapes by leveraging self-supervised features, enabling category-agnostic shape priors. The model reconstructs articulated 3D meshes from a single image in a feed-forward pass and is trained on a new Fauna Dataset spanning 128 species, achieving strong qualitative and quantitative results with ablations validating the SBSM and a mask-based regularizer. This work significantly advances scalable 3D animal modeling from uncontrolled Internet data and paves the way toward comprehensive biodiversity reconstruction, albeit within a quadruped skeletal constraint and with some data curation requirements.
Abstract
Learning 3D models of all animals on the Earth requires massively scaling up existing solutions. With this ultimate goal in mind, we develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly. One crucial bottleneck of modeling animals is the limited availability of training data, which we overcome by simply learning from 2D Internet images. We show that prior category-specific attempts fail to generalize to rare species with limited training images. We address this challenge by introducing the Semantic Bank of Skinned Models (SBSM), which automatically discovers a small set of base animal shapes by combining geometric inductive priors with semantic knowledge implicitly captured by an off-the-shelf self-supervised feature extractor. To train such a model, we also contribute a new large-scale dataset of diverse animal species. At inference time, given a single image of any quadruped animal, our model reconstructs an articulated 3D mesh in a feed-forward fashion within seconds.
