Assessing Image Inpainting via Re-Inpainting Self-Consistency Evaluation

Tianyi Chen; Jianfu Zhang; Yan Hong; Yiyi Zhang; Liqing Zhang

Assessing Image Inpainting via Re-Inpainting Self-Consistency Evaluation

Tianyi Chen, Jianfu Zhang, Yan Hong, Yiyi Zhang, Liqing Zhang

TL;DR

The paper tackles biases in inpainting evaluation that arise when ground-truth unmasked references are required. It introduces a self-supervised, multi-pass re-inpainting framework that measures self-consistency across re-inpainted variants by applying a second mask and a second inpainting network, yielding the metric $D(F_1) = (1/K) \sum_{k=1}^K d(\hat{X}_1, \hat{X}_2^k)$. By using patch masks and LPIPS as the sub-metric, the method remains robust to different second-network choices and mask configurations, and it does not rely on the original unmasked image. Extensive experiments on Places2 with five diverse inpainting methods show that the proposed framework correlates well with human judgments and NR-IQA baselines while mitigating biases associated with traditional evaluation metrics.

Abstract

Image inpainting, the task of reconstructing missing segments in corrupted images using available data, faces challenges in ensuring consistency and fidelity, especially under information-scarce conditions. Traditional evaluation methods, heavily dependent on the existence of unmasked reference images, inherently favor certain inpainting outcomes, introducing biases. Addressing this issue, we introduce an innovative evaluation paradigm that utilizes a self-supervised metric based on multiple re-inpainting passes. This approach, diverging from conventional reliance on direct comparisons in pixel or feature space with original images, emphasizes the principle of self-consistency to enable the exploration of various viable inpainting solutions, effectively reducing biases. Our extensive experiments across numerous benchmarks validate the alignment of our evaluation method with human judgment.

Assessing Image Inpainting via Re-Inpainting Self-Consistency Evaluation

TL;DR

. By using patch masks and LPIPS as the sub-metric, the method remains robust to different second-network choices and mask configurations, and it does not rely on the original unmasked image. Extensive experiments on Places2 with five diverse inpainting methods show that the proposed framework correlates well with human judgments and NR-IQA baselines while mitigating biases associated with traditional evaluation metrics.

Abstract

Paper Structure (24 sections, 3 equations, 17 figures, 5 tables, 2 algorithms)

This paper contains 24 sections, 3 equations, 17 figures, 5 tables, 2 algorithms.

Introduction
Related Works
Image Inpainting
Perceptual Metrics
Methodology
Notations
The Proposed Framework
Alleviating Bias with Patch Masks
Experiments
Implementation Details
Inpainting Methods and Dataset
Masks
Choice of Metric Objective
Choice of Sub-Metric and the Second Inpainting Network
Choice of Second Mask Ratio
...and 9 more sections

Figures (17)

Figure 1: An example showcases the potential variations in inpainted results for a single image. The presence of a large masked area, which may encompass crucial content that cannot be accurately restored by inpainting methods, leads to inpainted images with multiple possible layouts. Comparing the inpainted images directly to the original images can introduce bias into the evaluation process.
Figure 2: Overview of our proposed image inpainting metric. We incorporate a multi-pass approach to enhance evaluation stability by iteratively re-inpainting the inpainted images using multiple patch masks. This iterative process allows us to calculate the perceptual metric between the inpainted images and the corresponding re-inpainted images, thereby capturing the consistency and fidelity of the inpainting method.
Figure 3: Comparison of inpainted images masked by normal mask and patch mask. \ref{['fig:groundtruth']}\ref{['fig:normal mask example']}\ref{['fig:inpainted_normal']}\ref{['fig:patch mask example']}\ref{['fig:inpainted_patch']} show image examples under different settings. \ref{['fig:comparison']} shows the distribution of LPIPS scores with different types of masks (normal or patch masks) relative to the original image. For each type of mask, we use 100 different random seeds using StableDiffusion with the same mask and the same original image.
Figure 4: The LPIPS score distribution of three metric objectives.
Figure 5: Examples of synthesized images, from left to right: natural image, blended image, and noised image
...and 12 more figures

Assessing Image Inpainting via Re-Inpainting Self-Consistency Evaluation

TL;DR

Abstract

Assessing Image Inpainting via Re-Inpainting Self-Consistency Evaluation

Authors

TL;DR

Abstract

Table of Contents

Figures (17)