CloseUpAvatar: High-Fidelity Animatable Full-Body Avatars with Mixture of Multi-Scale Textures
David Svitov, Pietro Morerio, Lourdes Agapito, Alessio Del Bue
TL;DR
CloseUpAvatar introduces a full-body avatar representation built from textured billboards that carry two levels of texture detail (MST^L and MST^H). A Mixture of Multi-Scale Textures blends these levels with a camera-distance–dependent coefficient, enabling high-fidelity close-ups while preserving efficiency for distant views. The approach uses billboard splatting, SMPL-X alignment, and a carefully designed training regime with camera augmentation and regularization to converge across diverse viewpoints, achieving state-of-the-art perceptual metrics at close and far views with real-time performance. Experiments on ActorsHQ demonstrate strong qualitative and quantitative gains, though limitations remain in non-rigid finger/face details guidance and extreme deformations of large primitives.
Abstract
We present a CloseUpAvatar - a novel approach for articulated human avatar representation dealing with more general camera motions, while preserving rendering quality for close-up views. CloseUpAvatar represents an avatar as a set of textured planes with two sets of learnable textures for low and high-frequency detail. The method automatically switches to high-frequency textures only for cameras positioned close to the avatar's surface and gradually reduces their impact as the camera moves farther away. Such parametrization of the avatar enables CloseUpAvatar to adjust rendering quality based on camera distance ensuring realistic rendering across a wider range of camera orientations than previous approaches. We provide experiments using the ActorsHQ dataset with high-resolution input images. CloseUpAvatar demonstrates both qualitative and quantitative improvements over existing methods in rendering from novel wide range camera positions, while maintaining high FPS by limiting the number of required primitives.
