Table of Contents
Fetching ...

AvatarVTON: 4D Virtual Try-On for Animatable Avatars

Zicheng Jiang, Jixin Gao, Shengfeng He, Xinzhe Li, Yulong Zheng, Zhaotong Yang, Junyu Dong, Yong Du

TL;DR

AvatarVTON introduces a 4D virtual try-on framework that achieves free pose control and multi-view rendering from a single in-shop garment image. It presents two novel components, the Reciprocal Flow Rectifier (RFR) for prior-free temporal consistency and the Non-Linear Deformer (NLD) for view-pose aware, nonlinear garment deformations via Gaussian maps. The approach extends 4D VTON benchmarks and demonstrates superior fidelity, diversity, and dynamic realism over state-of-the-art baselines, with strong performance in AR/VR, gaming, and digital-human contexts. This work offers a scalable path toward high-fidelity, animatable digital humans with flexible garment editing and efficient training.

Abstract

We propose AvatarVTON, the first 4D virtual try-on framework that generates realistic try-on results from a single in-shop garment image, enabling free pose control, novel-view rendering, and diverse garment choices. Unlike existing methods, AvatarVTON supports dynamic garment interactions under single-view supervision, without relying on multi-view garment captures or physics priors. The framework consists of two key modules: (1) a Reciprocal Flow Rectifier, a prior-free optical-flow correction strategy that stabilizes avatar fitting and ensures temporal coherence; and (2) a Non-Linear Deformer, which decomposes Gaussian maps into view-pose-invariant and view-pose-specific components, enabling adaptive, non-linear garment deformations. To establish a benchmark for 4D virtual try-on, we extend existing baselines with unified modules for fair qualitative and quantitative comparisons. Extensive experiments show that AvatarVTON achieves high fidelity, diversity, and dynamic garment realism, making it well-suited for AR/VR, gaming, and digital-human applications.

AvatarVTON: 4D Virtual Try-On for Animatable Avatars

TL;DR

AvatarVTON introduces a 4D virtual try-on framework that achieves free pose control and multi-view rendering from a single in-shop garment image. It presents two novel components, the Reciprocal Flow Rectifier (RFR) for prior-free temporal consistency and the Non-Linear Deformer (NLD) for view-pose aware, nonlinear garment deformations via Gaussian maps. The approach extends 4D VTON benchmarks and demonstrates superior fidelity, diversity, and dynamic realism over state-of-the-art baselines, with strong performance in AR/VR, gaming, and digital-human contexts. This work offers a scalable path toward high-fidelity, animatable digital humans with flexible garment editing and efficient training.

Abstract

We propose AvatarVTON, the first 4D virtual try-on framework that generates realistic try-on results from a single in-shop garment image, enabling free pose control, novel-view rendering, and diverse garment choices. Unlike existing methods, AvatarVTON supports dynamic garment interactions under single-view supervision, without relying on multi-view garment captures or physics priors. The framework consists of two key modules: (1) a Reciprocal Flow Rectifier, a prior-free optical-flow correction strategy that stabilizes avatar fitting and ensures temporal coherence; and (2) a Non-Linear Deformer, which decomposes Gaussian maps into view-pose-invariant and view-pose-specific components, enabling adaptive, non-linear garment deformations. To establish a benchmark for 4D virtual try-on, we extend existing baselines with unified modules for fair qualitative and quantitative comparisons. Extensive experiments show that AvatarVTON achieves high fidelity, diversity, and dynamic garment realism, making it well-suited for AR/VR, gaming, and digital-human applications.

Paper Structure

This paper contains 26 sections, 5 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: We propose AvatarVTON, the first solution that enables 4D virtual try-on with free pose control, viewpoint rendering, and diverse garment selection from a single in-shop garment image.
  • Figure 2: Overview of AvatarVTON. Our framework incorporates two key techniques, namely Non-Linear Deformation Transfer, which establishes a mechanism to share nonlinear deformations across source and target tasks, and Reciprocal Flow Rectifier, which iteratively adjusts degraded frames generated by IDM-VTON idmvton, ensuring smooth variations during training.
  • Figure 3: Qualitative comparison with state-of-the-art approaches on AvatarReX zheng2023avatarrex.
  • Figure 4: Qualitative comparison in the ablation study.
  • Figure 5: Illustration and results of reciprocal optimization in RFR between the avatar and supervision frames, improving texture stability and cross-frame consistency.
  • ...and 5 more figures