Table of Contents
Fetching ...

Towards High-fidelity Head Blending with Chroma Keying for Industrial Applications

Hah Min Lew, Sahng-Min Yoo, Hyunwoo Kang, Gyeong-Moon Park

TL;DR

Quantitative and qualitative evaluations on benchmark datasets demonstrate that the proposed CHANGER outperforms state-of-the-art methods, de-livering high-fidelity, industrial-grade results.

Abstract

We introduce an industrial Head Blending pipeline for the task of seamlessly integrating an actor's head onto a target body in digital content creation. The key challenge stems from discrepancies in head shape and hair structure, which lead to unnatural boundaries and blending artifacts. Existing methods treat foreground and background as a single task, resulting in suboptimal blending quality. To address this problem, we propose CHANGER, a novel pipeline that decouples background integration from foreground blending. By utilizing chroma keying for artifact-free background generation and introducing Head shape and long Hair augmentation ($H^2$ augmentation) to simulate a wide range of head shapes and hair styles, CHANGER improves generalization on innumerable various real-world cases. Furthermore, our Foreground Predictive Attention Transformer (FPAT) module enhances foreground blending by predicting and focusing on key head and body regions. Quantitative and qualitative evaluations on benchmark datasets demonstrate that our CHANGER outperforms state-of-the-art methods, delivering high-fidelity, industrial-grade results.

Towards High-fidelity Head Blending with Chroma Keying for Industrial Applications

TL;DR

Quantitative and qualitative evaluations on benchmark datasets demonstrate that the proposed CHANGER outperforms state-of-the-art methods, de-livering high-fidelity, industrial-grade results.

Abstract

We introduce an industrial Head Blending pipeline for the task of seamlessly integrating an actor's head onto a target body in digital content creation. The key challenge stems from discrepancies in head shape and hair structure, which lead to unnatural boundaries and blending artifacts. Existing methods treat foreground and background as a single task, resulting in suboptimal blending quality. To address this problem, we propose CHANGER, a novel pipeline that decouples background integration from foreground blending. By utilizing chroma keying for artifact-free background generation and introducing Head shape and long Hair augmentation ( augmentation) to simulate a wide range of head shapes and hair styles, CHANGER improves generalization on innumerable various real-world cases. Furthermore, our Foreground Predictive Attention Transformer (FPAT) module enhances foreground blending by predicting and focusing on key head and body regions. Quantitative and qualitative evaluations on benchmark datasets demonstrate that our CHANGER outperforms state-of-the-art methods, delivering high-fidelity, industrial-grade results.

Paper Structure

This paper contains 20 sections, 13 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Qualitative comparisons of using recent inpainting baselines rombach2022highyang2023paintzhou2023propainter and the head blending model shu2022few on sequential frames of a target video. We tested both scenarios with and without text prompting (Prompt) for SDI. For PBE, we separated scenarios; the background (BG) and the foreground (FG) references (bottom-left blue boxes of each column).
  • Figure 2: Motivations of our work. We propose CHANGER to consider the real-world application. As shown in (a), the existing work (H2SB shu2022few) shows severe artifacts on inpainting regions. To inpaint the background flawlessly, we propose to introduce chroma keying in the head blending framework. However, it still shows low-fidelity results to inpaint the body, which is hidden due to the head shape and hair difference described in a red box of (b). CHANGER generates the high-fidelity foreground with $H^2$ augmentation and Foreground Predictive Attention Transformer (FPAT), which is explained in Section \ref{['sec:h2_augmentation']} and \ref{['sec:fpat']}, respectively. CHANGER removes artifacts as shown in the blue boxes of (b) and (c), and easily changes various high-fidelity real-world backgrounds. All backgrounds in the figure are from the benchmark dataset quattoni2009recognizing.
  • Figure 2: Visualization of $\bm{H^2}$ Augmentation. Eq. (2) is the input $X$ formulation during training. Inspired by yoo2023fastswap, we apply the same color jitter to both $I_{T}^{green}$ and the ground truth during the training phase. Eq. (3) shows the head shape augmentation. Eq. (4) shows the long hair augmentation.
  • Figure 3: Network overview of CHANGER. (a) We visualize how we conduct the input of the network ($X$) at the train (blue) and the test (red). We apply $H^2$ augmentation during the training to improve the fidelity of the generated image by improving the diversity of the input. (b) We visualize the network of CHANGER. The head colorizer colorizes the gray head of $X$, and the body blender inpaints the hidden body with a foreground mask-aware attention mechanism. Please refer to the detailed explanations of $H^2$ augmentation and FPAT in Section \ref{['sec:h2_augmentation']} and \ref{['sec:fpat']}, respectively.
  • Figure 3: The foreground mask predicted by FPAT ($M$), the attention map used in the transformer layer (Attention), and the head blending result ($Y$) when input source image $I_S$ and target image $I_T$ are used. We visualize the similarity between the query patch (red box) and each key patch in the depicted image as an attention map. Blue represents low values and yellow represents high values.
  • ...and 5 more figures