Table of Contents
Fetching ...

An Inpainting-Infused Pipeline for Attire and Background Replacement

Felipe Rodrigues Perche-Mahlow, André Felipe-Zanella, William Alberto Cruz-Castañeda, Marcellus Amadeus

TL;DR

The paper tackles the costly task of manually crafting inpainting masks for clothing and background edits by introducing a depth-guided pipeline. It automatically derives inpainting masks using MiDaS depth estimation, replaces backgrounds with Latent Consistency Models distilled from Stable Diffusion via the SDXL pipeline, and synthesizes new clothing through SD-XL Inpainting guided by textual prompts. Key contributions include an end-to-end workflow that eliminates manual mask annotation, rapid background generation with LCMs, and prompt-driven clothing edits while preserving facial regions. The approach yields coherent, contextually appropriate edits across diverse subjects and backgrounds, though depth accuracy and stylistic artifacts can depend on background blur and prompt choices. overall, this work demonstrates a practical, scalable framework for advanced image manipulation using state-of-the-art generative models.

Abstract

In recent years, groundbreaking advancements in Generative Artificial Intelligence (GenAI) have triggered a transformative paradigm shift, significantly influencing various domains. In this work, we specifically explore an integrated approach, leveraging advanced techniques in GenAI and computer vision emphasizing image manipulation. The methodology unfolds through several stages, including depth estimation, the creation of inpaint masks based on depth information, the generation and replacement of backgrounds utilizing Stable Diffusion in conjunction with Latent Consistency Models (LCMs), and the subsequent replacement of clothes and application of aesthetic changes through an inpainting pipeline. Experiments conducted in this study underscore the methodology's efficacy, highlighting its potential to produce visually captivating content. The convergence of these advanced techniques allows users to input photographs of individuals and manipulate them to modify clothing and background based on specific prompts without manually input inpainting masks, effectively placing the subjects within the vast landscape of creative imagination.

An Inpainting-Infused Pipeline for Attire and Background Replacement

TL;DR

The paper tackles the costly task of manually crafting inpainting masks for clothing and background edits by introducing a depth-guided pipeline. It automatically derives inpainting masks using MiDaS depth estimation, replaces backgrounds with Latent Consistency Models distilled from Stable Diffusion via the SDXL pipeline, and synthesizes new clothing through SD-XL Inpainting guided by textual prompts. Key contributions include an end-to-end workflow that eliminates manual mask annotation, rapid background generation with LCMs, and prompt-driven clothing edits while preserving facial regions. The approach yields coherent, contextually appropriate edits across diverse subjects and backgrounds, though depth accuracy and stylistic artifacts can depend on background blur and prompt choices. overall, this work demonstrates a practical, scalable framework for advanced image manipulation using state-of-the-art generative models.

Abstract

In recent years, groundbreaking advancements in Generative Artificial Intelligence (GenAI) have triggered a transformative paradigm shift, significantly influencing various domains. In this work, we specifically explore an integrated approach, leveraging advanced techniques in GenAI and computer vision emphasizing image manipulation. The methodology unfolds through several stages, including depth estimation, the creation of inpaint masks based on depth information, the generation and replacement of backgrounds utilizing Stable Diffusion in conjunction with Latent Consistency Models (LCMs), and the subsequent replacement of clothes and application of aesthetic changes through an inpainting pipeline. Experiments conducted in this study underscore the methodology's efficacy, highlighting its potential to produce visually captivating content. The convergence of these advanced techniques allows users to input photographs of individuals and manipulate them to modify clothing and background based on specific prompts without manually input inpainting masks, effectively placing the subjects within the vast landscape of creative imagination.
Paper Structure (11 sections, 1 equation, 4 figures)

This paper contains 11 sections, 1 equation, 4 figures.

Figures (4)

  • Figure 1: Steps for inpainting mask processing. Image a) contains the original image of a woman holding a bouquet. Image b) is processed in black and white using the MiDaS algorithm. Image c) is the result after applying the threshold, and finally, image d) is the final mask after applying the facial recognition algorithm.
  • Figure 2: Flowchart of the complete pipeline
  • Figure 3: Output using the pipeline for base images a)-d), and obtaining results e)-h). The prompts used for the background ($B_P$), clothes ($C_P$), and depth threshold $T_h$ were, respectively: e) pirate ship, pirate clothes, 0.6; f) magic castle painting, girl wizard clothes, 0.7; g) alien invasion, hero girl with fire powers, 0.6; h) black and white medieval battlefield painting, black and white metal battle armor, 0.5.
  • Figure 4: Output using the pipeline for base images a)-d), and obtaining results e)-h). The prompts used for the background ($B_P$), clothes ($C_P$), and depth threshold $T_h$ were, respectively: e) space station exterior ultra HD 4K, astronaut suit ultra HD 4K, 0.7; f) ancient Egypt, ancient Egypt clothes pharaoh clothes, 0.7; g) princess castle, princess clothes, 0.6; h) industrial kitchen, chef clothes, 0.5.