WildAvatar: Learning In-the-wild 3D Avatars from the Web
Zihao Huang, Shoukang Hu, Guangcong Wang, Tianqi Liu, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu
TL;DR
This work tackles the scarcity of real-world 3D avatar data by introducing a fully automated web-video annotation pipeline with robust filtering to mine in-the-wild human motions from YouTube. The resulting WildAvatar dataset contains over 10k subjects and scenes, significantly expanding diversity in pose, viewpoint, and clothing without specialized equipment. Empirical results show the pipeline achieves state-of-the-art SMPL annotations on EMDB, improves verification on web videos, and enhances both per-subject and generalizable avatar methods when trained on WildAvatar, with notable gains in PSNR, SSIM, and LPIPS. By enabling large-scale, real-world avatar data and releasing code and data, the work aims to advance practical 3D/4D avatar creation and related tasks.
Abstract
Existing research on avatar creation is typically limited to laboratory datasets, which require high costs against scalability and exhibit insufficient representation of the real world. On the other hand, the web abounds with off-the-shelf real-world human videos, but these videos vary in quality and require accurate annotations for avatar creation. To this end, we propose an automatic annotating pipeline with filtering protocols to curate these humans from the web. Our pipeline surpasses state-of-the-art methods on the EMDB benchmark, and the filtering protocols boost verification metrics on web videos. We then curate WildAvatar, a web-scale in-the-wild human avatar creation dataset extracted from YouTube, with $10000+$ different human subjects and scenes. WildAvatar is at least $10\times$ richer than previous datasets for 3D human avatar creation and closer to the real world. To explore its potential, we demonstrate the quality and generalizability of avatar creation methods on WildAvatar. We will publicly release our code, data source links and annotations to push forward 3D human avatar creation and other related fields for real-world applications.
