An Inpainting-Infused Pipeline for Attire and Background Replacement
Felipe Rodrigues Perche-Mahlow, André Felipe-Zanella, William Alberto Cruz-Castañeda, Marcellus Amadeus
TL;DR
The paper tackles the costly task of manually crafting inpainting masks for clothing and background edits by introducing a depth-guided pipeline. It automatically derives inpainting masks using MiDaS depth estimation, replaces backgrounds with Latent Consistency Models distilled from Stable Diffusion via the SDXL pipeline, and synthesizes new clothing through SD-XL Inpainting guided by textual prompts. Key contributions include an end-to-end workflow that eliminates manual mask annotation, rapid background generation with LCMs, and prompt-driven clothing edits while preserving facial regions. The approach yields coherent, contextually appropriate edits across diverse subjects and backgrounds, though depth accuracy and stylistic artifacts can depend on background blur and prompt choices. overall, this work demonstrates a practical, scalable framework for advanced image manipulation using state-of-the-art generative models.
Abstract
In recent years, groundbreaking advancements in Generative Artificial Intelligence (GenAI) have triggered a transformative paradigm shift, significantly influencing various domains. In this work, we specifically explore an integrated approach, leveraging advanced techniques in GenAI and computer vision emphasizing image manipulation. The methodology unfolds through several stages, including depth estimation, the creation of inpaint masks based on depth information, the generation and replacement of backgrounds utilizing Stable Diffusion in conjunction with Latent Consistency Models (LCMs), and the subsequent replacement of clothes and application of aesthetic changes through an inpainting pipeline. Experiments conducted in this study underscore the methodology's efficacy, highlighting its potential to produce visually captivating content. The convergence of these advanced techniques allows users to input photographs of individuals and manipulate them to modify clothing and background based on specific prompts without manually input inpainting masks, effectively placing the subjects within the vast landscape of creative imagination.
