TexVocab: Texture Vocabulary-conditioned Human Avatars
Yuxiao Liu, Zhe Li, Yebin Liu, Haoqian Wang
TL;DR
TexVocab addresses the challenge of generating high-fidelity animatable human avatars from multi-view RGB videos by introducing a texture vocabulary tied to pose-conditioned texture maps. It constructs texture maps by back-projecting images onto the posed SMPL surface and mapping them to a fixed SMPL UV domain, then learns a body-part–wise embedding to capture pose-dependent texture changes. Pose features are queried via KNN over key body parts, interpolated with skinning-aware attention, and used to condition a NeRF decoder for dynamic appearances. Experiments across THUman4.0, ZJU-MoCap, and DeepCap demonstrate state-of-the-art quality and robust pose generalization, with ablations confirming the benefits of body-part–wise encoding and multi-view texture maps; limitations include reliance on dense views and SMPL-based clothing representations.
Abstract
To adequately utilize the available image evidence in multi-view video-based avatar modeling, we propose TexVocab, a novel avatar representation that constructs a texture vocabulary and associates body poses with texture maps for animation. Given multi-view RGB videos, our method initially back-projects all the available images in the training videos to the posed SMPL surface, producing texture maps in the SMPL UV domain. Then we construct pairs of human poses and texture maps to establish a texture vocabulary for encoding dynamic human appearances under various poses. Unlike the commonly used joint-wise manner, we further design a body-part-wise encoding strategy to learn the structural effects of the kinematic chain. Given a driving pose, we query the pose feature hierarchically by decomposing the pose vector into several body parts and interpolating the texture features for synthesizing fine-grained human dynamics. Overall, our method is able to create animatable human avatars with detailed and dynamic appearances from RGB videos, and the experiments show that our method outperforms state-of-the-art approaches. The project page can be found at https://texvocab.github.io/.
