ArtiFade: Learning to Generate High-quality Subject from Blemished Images
Shuya Yang, Shaozhe Hao, Yukang Cao, Kwan-Yee K. Wong
TL;DR
ArtiFade tackles blemished subject-driven generation by aligning unblemished and blemished training data and fine-tuning only selective diffusion-model components along with an artifact-free textual embedding. The method constructs paired data of unblemished and blemished images, applies Textual Inversion to obtain blemished embeddings, and optimizes a dedicated artifact-free embedding while fine-tuning cross-attention keys and values to reconstruct clean subject images. A bespoke evaluation benchmark and comprehensive experiments demonstrate superior artifact removal and subject fidelity in both in-distribution and out-of-distribution scenarios, including compatibility with DreamBooth via LoRA. The work offers a practical, generalizable solution for real-world image collections containing artifacts such as watermarks, stickers, or adversarial noise, enabling robust subject-driven generation in diverse settings.
Abstract
Subject-driven text-to-image generation has witnessed remarkable advancements in its ability to learn and capture characteristics of a subject using only a limited number of images. However, existing methods commonly rely on high-quality images for training and may struggle to generate reasonable images when the input images are blemished by artifacts. This is primarily attributed to the inadequate capability of current techniques in distinguishing subject-related features from disruptive artifacts. In this paper, we introduce ArtiFade to tackle this issue and successfully generate high-quality artifact-free images from blemished datasets. Specifically, ArtiFade exploits fine-tuning of a pre-trained text-to-image model, aiming to remove artifacts. The elimination of artifacts is achieved by utilizing a specialized dataset that encompasses both unblemished images and their corresponding blemished counterparts during fine-tuning. ArtiFade also ensures the preservation of the original generative capabilities inherent within the diffusion model, thereby enhancing the overall performance of subject-driven methods in generating high-quality and artifact-free images. We further devise evaluation benchmarks tailored for this task. Through extensive qualitative and quantitative experiments, we demonstrate the generalizability of ArtiFade in effective artifact removal under both in-distribution and out-of-distribution scenarios.
