Table of Contents
Fetching ...

ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On

Junseo Park, Hyeryung Jang

TL;DR

The paper tackles boundary artifacts in virtual try-on by ensuring non-try-on regions are preserved while garments are accurately aligned. It introduces ART-VITON, a measurement-guided diffusion framework that Reformulates VITON as a linear inverse problem and integrates measurement constraints into the diffusion sampling process. Its key contributions include a residual prior initialization to reduce train–test mismatch, and an artifact-free sampling pipeline combining data consistency, frequency-level correction, and periodic standard denoising. Experiments on VITON-HD, DressCode, and SHHQ-1.0 demonstrate improved identity/background preservation, elimination of seams, and robust performance across diverse baselines and domains.

Abstract

Virtual try-on (VITON) aims to generate realistic images of a person wearing a target garment, requiring precise garment alignment in try-on regions and faithful preservation of identity and background in non-try-on regions. While latent diffusion models (LDMs) have advanced alignment and detail synthesis, preserving non-try-on regions remains challenging. A common post-hoc strategy directly replaces these regions with original content, but abrupt transitions often produce boundary artifacts. To overcome this, we reformulate VITON as a linear inverse problem and adopt trajectory-aligned solvers that progressively enforce measurement consistency, reducing abrupt changes in non-try-on regions. However, existing solvers still suffer from semantic drift during generation, leading to artifacts. We propose ART-VITON, a measurement-guided diffusion framework that ensures measurement adherence while maintaining artifact-free synthesis. Our method integrates residual prior-based initialization to mitigate training-inference mismatch and artifact-free measurement-guided sampling that combines data consistency, frequency-level correction, and periodic standard denoising. Experiments on VITON-HD, DressCode, and SHHQ-1.0 demonstrate that ART-VITON effectively preserves identity and background, eliminates boundary artifacts, and consistently improves visual fidelity and robustness over state-of-the-art baselines.

ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On

TL;DR

The paper tackles boundary artifacts in virtual try-on by ensuring non-try-on regions are preserved while garments are accurately aligned. It introduces ART-VITON, a measurement-guided diffusion framework that Reformulates VITON as a linear inverse problem and integrates measurement constraints into the diffusion sampling process. Its key contributions include a residual prior initialization to reduce train–test mismatch, and an artifact-free sampling pipeline combining data consistency, frequency-level correction, and periodic standard denoising. Experiments on VITON-HD, DressCode, and SHHQ-1.0 demonstrate improved identity/background preservation, elimination of seams, and robust performance across diverse baselines and domains.

Abstract

Virtual try-on (VITON) aims to generate realistic images of a person wearing a target garment, requiring precise garment alignment in try-on regions and faithful preservation of identity and background in non-try-on regions. While latent diffusion models (LDMs) have advanced alignment and detail synthesis, preserving non-try-on regions remains challenging. A common post-hoc strategy directly replaces these regions with original content, but abrupt transitions often produce boundary artifacts. To overcome this, we reformulate VITON as a linear inverse problem and adopt trajectory-aligned solvers that progressively enforce measurement consistency, reducing abrupt changes in non-try-on regions. However, existing solvers still suffer from semantic drift during generation, leading to artifacts. We propose ART-VITON, a measurement-guided diffusion framework that ensures measurement adherence while maintaining artifact-free synthesis. Our method integrates residual prior-based initialization to mitigate training-inference mismatch and artifact-free measurement-guided sampling that combines data consistency, frequency-level correction, and periodic standard denoising. Experiments on VITON-HD, DressCode, and SHHQ-1.0 demonstrate that ART-VITON effectively preserves identity and background, eliminates boundary artifacts, and consistently improves visual fidelity and robustness over state-of-the-art baselines.

Paper Structure

This paper contains 25 sections, 16 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Comparison of boundary artifacts across methods. StableVITON generates artifact-free outputs (A) but violates measurements (M). Post-hoc replacement enforces M but introduces seams A. Inverse solvers maintain M but accumulate semantic drift A. ART-VITON satisfies measurement constraints while remaining artifact-free. Green: success (measurement adherence or artifact-free); red: violations or artifacts. Solid/Dashed boxes show final/intermediate ($t{=}835$) outputs.
  • Figure 2: ART-VITON pipeline. (A) Residual prior-based initialization mitigates train-test mismatch. (B) Artifact-free measurement-guided inverse solver enforces measurements while preserving semantics: ① Tweedie estimation retains clothing details but lacks fidelity in non-try-on regions. ② Hard measurement constraints in pixel space correct preserved regions. High-frequency losses during ③ VAE encoding are compensated by ④ Data consistency and ⑤ Frequency correction (shown in (B-1)). (C) Periodic standard denoising realigns trajectories with data manifolds $\mathcal{M}_t$ for smooth blending. (B-2) visualizes this sampling trajectory.
  • Figure 3: Comparison of StableVITON baseline and inverse solvers on VITON-HD. (a) High-frequency loss leads to texture degradation. (b) Boundary artifacts show inconsistencies at region interfaces. Hard-constraint methods (RePaint, MCG) produce sharp transitions; progressive updates (DPS, FIG) show incomplete convergence; and hybrid stochastic methods (DreamSampler, TReg) degrade texture fidelity. Our method preserves both texture fidelity and seamless boundaries.
  • Figure 4: Comparison of baseline models with and without our method across datasets. (a) On VITON-HD, our method removes boundary artifacts while preserving garment details in DCI-VTON, GarDiff, and StableVITON. Heatmaps visualize gradient magnitudes at boundaries. (b) On HSSQ-1.0, cross-domain evaluation (trained on VITON-HD) shows our approach maintains artifact-free results and natural boundary transitions, demonstrating strong generalizability across clothing types and poses.
  • Figure 5: Ablation study of pipeline components. Direct measurement enforcement increases artifacts, while subsequent additions (data consistency, frequency correction, and periodic denoising) progressively reduce them, yielding artifact-free and coherent results.
  • ...and 6 more figures