DreamInpainter: Text-Guided Subject-Driven Image Inpainting with Diffusion Models
Shaoan Xie, Yang Zhao, Zhisheng Xiao, Kelvin C. K. Chan, Yandong Li, Yanwu Xu, Kun Zhang, Tingbo Hou
TL;DR
This work introduces DreamInpainter, a diffusion-model-based framework for Text-Guided Subject-Driven Image Inpainting that uses both text prompts and a reference exemplar to guide inpainting. It tackles the copy-paste risk and limited text control by (1) extracting discriminative dense subject features from the UNet downstack, (2) selecting the top-K tokens to preserve identity while enabling edits, and (3) applying a decoupling regularization that imposes text-driven restoration over the entire image. Empirical results on COCOEE and DreamBoothEE show improved realism and stronger alignment with text prompts, along with ablations validating the effectiveness of token selection and decoupling regularization. The approach enables a range of applications from faithful subject insertion to stylized and attribute-edited inpainted content, contributing a practical solution for balanced, controllable inpainting with dual guidance signals.
Abstract
This study introduces Text-Guided Subject-Driven Image Inpainting, a novel task that combines text and exemplar images for image inpainting. While both text and exemplar images have been used independently in previous efforts, their combined utilization remains unexplored. Simultaneously accommodating both conditions poses a significant challenge due to the inherent balance required between editability and subject fidelity. To tackle this challenge, we propose a two-step approach DreamInpainter. First, we compute dense subject features to ensure accurate subject replication. Then, we employ a discriminative token selection module to eliminate redundant subject details, preserving the subject's identity while allowing changes according to other conditions such as mask shape and text prompts. Additionally, we introduce a decoupling regularization technique to enhance text control in the presence of exemplar images. Our extensive experiments demonstrate the superior performance of our method in terms of visual quality, identity preservation, and text control, showcasing its effectiveness in the context of text-guided subject-driven image inpainting.
