Table of Contents
Fetching ...

Beyond Imperfections: A Conditional Inpainting Approach for End-to-End Artifact Removal in VTON and Pose Transfer

Aref Tabatabaei, Zahra Dehghanian, Maryam Amirmazlaghani

TL;DR

The paper addresses artifacts that degrade the realism of VTON and pose transfer outputs. It proposes a conditional inpainting framework built on Stable Diffusion, guided by ControlNet and IP-Adapter, and augmented with automatic artifact detection and multi-modal conditioning. It introduces two task-driven datasets, DDI and VDI, with artifact masks and references for robust evaluation. Experimental results show quantitative gains in standard image-quality metrics (e.g., SSIM, LPIPS, FID) and strong qualitative judgments from human evaluators, indicating cleaner, more realistic renderings. The method provides an end-to-end solution and public resources to advance artifact removal in VTON and pose transfer.

Abstract

Artifacts often degrade the visual quality of virtual try-on (VTON) and pose transfer applications, impacting user experience. This study introduces a novel conditional inpainting technique designed to detect and remove such distortions, improving image aesthetics. Our work is the first to present an end-to-end framework addressing this specific issue, and we developed a specialized dataset of artifacts in VTON and pose transfer tasks, complete with masks highlighting the affected areas. Experimental results show that our method not only effectively removes artifacts but also significantly enhances the visual quality of the final images, setting a new benchmark in computer vision and image processing.

Beyond Imperfections: A Conditional Inpainting Approach for End-to-End Artifact Removal in VTON and Pose Transfer

TL;DR

The paper addresses artifacts that degrade the realism of VTON and pose transfer outputs. It proposes a conditional inpainting framework built on Stable Diffusion, guided by ControlNet and IP-Adapter, and augmented with automatic artifact detection and multi-modal conditioning. It introduces two task-driven datasets, DDI and VDI, with artifact masks and references for robust evaluation. Experimental results show quantitative gains in standard image-quality metrics (e.g., SSIM, LPIPS, FID) and strong qualitative judgments from human evaluators, indicating cleaner, more realistic renderings. The method provides an end-to-end solution and public resources to advance artifact removal in VTON and pose transfer.

Abstract

Artifacts often degrade the visual quality of virtual try-on (VTON) and pose transfer applications, impacting user experience. This study introduces a novel conditional inpainting technique designed to detect and remove such distortions, improving image aesthetics. Our work is the first to present an end-to-end framework addressing this specific issue, and we developed a specialized dataset of artifacts in VTON and pose transfer tasks, complete with masks highlighting the affected areas. Experimental results show that our method not only effectively removes artifacts but also significantly enhances the visual quality of the final images, setting a new benchmark in computer vision and image processing.
Paper Structure (19 sections, 9 figures, 1 table)

This paper contains 19 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: Sample of Artifacts in Our datasets: Rows a), b), and c) showcase distinct artifact types: color and texture, deformation in body parts, and cloth design. Images in our datasets are further classified by dataset origin: 1) VDI dataset (VTON-HD Distorted Images) and 2) DDI dataset (Deepfashion Distorted Images). Details on datasets will be discussed on Section \ref{['sec:datasets']}.
  • Figure 2: End-to-End Artifact Removal Model Architecture Overview: In the shown example, our artifact detection model identifies that the hair and collar of the shirt are distorted and generates a mask for the affected region. Following this, conditions and prompts are generated, and simultaneously, our scale generator model calculates impact scales for each condition. In this example, the Canny condition (blue) and IP-Adapter (yellow) exhibit higher impact, while the pose (green) and segmentation (purple) have lower impact.
  • Figure 3: Automatic Artifact Detection: Comparison of distorted and target conditions, including pose image, Canny edge detection image, color palette and YOLO detection
  • Figure 7: Usage of IP-Adapter: Addressing color artifacts to improve overall image quality and fidelity
  • Figure 8: Automatic Prompt Generation Process: This process relies on the mask region in the distorted image and its underlying content. A text prompt is generated, and a reference (condition) image is selected for the image prompt.
  • ...and 4 more figures