Avatar4D: Synthesizing Domain-Specific 4D Humans for Real-World Pose Estimation
Jerrin Bright, Zhibo Wang, Dmytro Klepachevskyi, Yuhao Chen, Sirisha Rambhatla, David Clausi, John Zelek
TL;DR
The paper addresses the need for domain-specific, labeled 4D human motion data to improve pose estimation. It introduces Avatar4D, a three-stage pipeline that generates controllable, photorealistic 4D humans and a large synthetic sports dataset, Syn2Sport. Through extensive experiments, it demonstrates strong supervised performance, zero-shot transfer to real data, and cross-sport generalization, supported by feature-space alignment analyses. The work shows synthetic data can reduce reliance on real annotations while enabling scalable, transferable human motion modeling for domain-specific tasks.
Abstract
We present Avatar4D, a real-world transferable pipeline for generating customizable synthetic human motion datasets tailored to domain-specific applications. Unlike prior works, which focus on general, everyday motions and offer limited flexibility, our approach provides fine-grained control over body pose, appearance, camera viewpoint, and environmental context, without requiring any manual annotations. To validate the impact of Avatar4D, we focus on sports, where domain-specific human actions and movement patterns pose unique challenges for motion understanding. In this setting, we introduce Syn2Sport, a large-scale synthetic dataset spanning sports, including baseball and ice hockey. Avatar4D features high-fidelity 4D (3D geometry over time) human motion sequences with varying player appearances rendered in diverse environments. We benchmark several state-of-the-art pose estimation models on Syn2Sport and demonstrate their effectiveness for supervised learning, zero-shot transfer to real-world data, and generalization across sports. Furthermore, we evaluate how closely the generated synthetic data aligns with real-world datasets in feature space. Our results highlight the potential of such systems to generate scalable, controllable, and transferable human datasets for diverse domain-specific tasks without relying on domain-specific real data.
