AvatarVTON: 4D Virtual Try-On for Animatable Avatars
Zicheng Jiang, Jixin Gao, Shengfeng He, Xinzhe Li, Yulong Zheng, Zhaotong Yang, Junyu Dong, Yong Du
TL;DR
AvatarVTON introduces a 4D virtual try-on framework that achieves free pose control and multi-view rendering from a single in-shop garment image. It presents two novel components, the Reciprocal Flow Rectifier (RFR) for prior-free temporal consistency and the Non-Linear Deformer (NLD) for view-pose aware, nonlinear garment deformations via Gaussian maps. The approach extends 4D VTON benchmarks and demonstrates superior fidelity, diversity, and dynamic realism over state-of-the-art baselines, with strong performance in AR/VR, gaming, and digital-human contexts. This work offers a scalable path toward high-fidelity, animatable digital humans with flexible garment editing and efficient training.
Abstract
We propose AvatarVTON, the first 4D virtual try-on framework that generates realistic try-on results from a single in-shop garment image, enabling free pose control, novel-view rendering, and diverse garment choices. Unlike existing methods, AvatarVTON supports dynamic garment interactions under single-view supervision, without relying on multi-view garment captures or physics priors. The framework consists of two key modules: (1) a Reciprocal Flow Rectifier, a prior-free optical-flow correction strategy that stabilizes avatar fitting and ensures temporal coherence; and (2) a Non-Linear Deformer, which decomposes Gaussian maps into view-pose-invariant and view-pose-specific components, enabling adaptive, non-linear garment deformations. To establish a benchmark for 4D virtual try-on, we extend existing baselines with unified modules for fair qualitative and quantitative comparisons. Extensive experiments show that AvatarVTON achieves high fidelity, diversity, and dynamic garment realism, making it well-suited for AR/VR, gaming, and digital-human applications.
