Table of Contents
Fetching ...

Inpainting-Driven Mask Optimization for Object Removal

Kodai Shimosato, Norimichi Ukita

TL;DR

A mask optimization method for improving the quality of object removal using image inpainting by training the inpainting network with object masks extracted by segmentation, and such object masks are also used in the inference step.

Abstract

This paper proposes a mask optimization method for improving the quality of object removal using image inpainting. While many inpainting methods are trained with a set of random masks, a target for inpainting may be an object, such as a person, in many realistic scenarios. This domain gap between masks in training and inference images increases the difficulty of the inpainting task. In our method, this domain gap is resolved by training the inpainting network with object masks extracted by segmentation, and such object masks are also used in the inference step. Furthermore, to optimize the object masks for inpainting, the segmentation network is connected to the inpainting network and end-to-end trained to improve the inpainting performance. The effect of this end-to-end training is further enhanced by our mask expansion loss for achieving the trade-off between large and small masks. Experimental results demonstrate the effectiveness of our method for better object removal using image inpainting.

Inpainting-Driven Mask Optimization for Object Removal

TL;DR

A mask optimization method for improving the quality of object removal using image inpainting by training the inpainting network with object masks extracted by segmentation, and such object masks are also used in the inference step.

Abstract

This paper proposes a mask optimization method for improving the quality of object removal using image inpainting. While many inpainting methods are trained with a set of random masks, a target for inpainting may be an object, such as a person, in many realistic scenarios. This domain gap between masks in training and inference images increases the difficulty of the inpainting task. In our method, this domain gap is resolved by training the inpainting network with object masks extracted by segmentation, and such object masks are also used in the inference step. Furthermore, to optimize the object masks for inpainting, the segmentation network is connected to the inpainting network and end-to-end trained to improve the inpainting performance. The effect of this end-to-end training is further enhanced by our mask expansion loss for achieving the trade-off between large and small masks. Experimental results demonstrate the effectiveness of our method for better object removal using image inpainting.
Paper Structure (23 sections, 8 equations, 5 figures, 5 tables)

This paper contains 23 sections, 8 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Inpainting variability occurred by minor mask differences. The mask used for reconstructing (c) is the ground-truth segment of a standing person observed in (a). The mask is eroded and dilated to get (b) and (d), respectively.
  • Figure 2: Proposed joint segmentation-inpainting network. Blue, green, red, and black arrows indicate the flows of RGB, edge, segmentation, and mask images, respectively. Blue rectangles and gray hourglass-shaped boxes indicate processes and learnable sub-networks, respectively. The inpainting network consists of three inpainting networks, namely the edge, segmentation, and image inpainters, as enclosed by the dotted line. In training and inference steps, "$I$, $I_{G}$, and $M_{G}$" and "$I$" are given to this joint network, respectively.
  • Figure 3: Distance maps of different $\alpha$ in Eq. (\ref{['eq:constant_expansion_loss']}). The distance values are normalized between -1 and 1.
  • Figure 4: Visual results (success cases). An object region is pasted onto the GT image (l) in order to synthesize the input image (a). This input image is fed into each inpainting network with a mask image. In the previous inpainting methods DBLP:journals/tog/IizukaS017DBLP:conf/cvpr/Yu0YSLH18DBLP:conf/iccv/YuLYSLH19NazeriNJQE19DBLP:conf/cvpr/YiTAJX20DBLP:conf/mm/YuZWPCLMXM21DBLP:conf/iccv/YuZLPMXM21DBLP:conf/cvpr/LiWZDT20yamashita2021, the mask image (p) generated by PanopticFPN is used. On the other hand, the mask images (j) and (k) are estimated in our methods.
  • Figure 5: Visual results (failure case). The input and ground-truth images are shown in (a) and (c), respectively. The mask and inpainted images of our method with MEL are in (e) and (b), respectively. The mask generated by PanopticFPN is shown in (d) for comparison.