Table of Contents
Fetching ...

Panoramic Image Inpainting With Gated Convolution And Contextual Reconstruction Loss

Li Yu, Yanjun Gao, Farhad Pakdaman, Moncef Gabbouj

TL;DR

This work tackles panoramic image inpainting by operating on cubemap-projected CMP inputs and employing a two-generator architecture (Face Generator and Cube Generator) that uses gated convolutions to distinguish valid versus invalid pixels. A side branch with contextual reconstruction loss guides the network to select the most appropriate reference patches, while Slice and Whole discriminators enforce per-face realism and inter-face consistency. Training relies on a WGAN framework with gradient penalty, augmented by $L_1$ losses on masked and unmasked regions and the CR loss $L_{CR}$ to reduce ambiguity in patch selection. Empirical results on the SUN360 dataset show superior PSNR/SSIM performance over state-of-the-art methods across mask ratios, with ablation studies confirming the contributions of gated convolutions and CR loss, and the CMP input reducing pole distortion compared to ERP-based approaches.

Abstract

Deep learning-based methods have demonstrated encouraging results in tackling the task of panoramic image inpainting. However, it is challenging for existing methods to distinguish valid pixels from invalid pixels and find suitable references for corrupted areas, thus leading to artifacts in the inpainted results. In response to these challenges, we propose a panoramic image inpainting framework that consists of a Face Generator, a Cube Generator, a side branch, and two discriminators. We use the Cubemap Projection (CMP) format as network input. The generator employs gated convolutions to distinguish valid pixels from invalid ones, while a side branch is designed utilizing contextual reconstruction (CR) loss to guide the generators to find the most suitable reference patch for inpainting the missing region. The proposed method is compared with state-of-the-art (SOTA) methods on SUN360 Street View dataset in terms of PSNR and SSIM. Experimental results and ablation study demonstrate that the proposed method outperforms SOTA both quantitatively and qualitatively.

Panoramic Image Inpainting With Gated Convolution And Contextual Reconstruction Loss

TL;DR

This work tackles panoramic image inpainting by operating on cubemap-projected CMP inputs and employing a two-generator architecture (Face Generator and Cube Generator) that uses gated convolutions to distinguish valid versus invalid pixels. A side branch with contextual reconstruction loss guides the network to select the most appropriate reference patches, while Slice and Whole discriminators enforce per-face realism and inter-face consistency. Training relies on a WGAN framework with gradient penalty, augmented by losses on masked and unmasked regions and the CR loss to reduce ambiguity in patch selection. Empirical results on the SUN360 dataset show superior PSNR/SSIM performance over state-of-the-art methods across mask ratios, with ablation studies confirming the contributions of gated convolutions and CR loss, and the CMP input reducing pole distortion compared to ERP-based approaches.

Abstract

Deep learning-based methods have demonstrated encouraging results in tackling the task of panoramic image inpainting. However, it is challenging for existing methods to distinguish valid pixels from invalid pixels and find suitable references for corrupted areas, thus leading to artifacts in the inpainted results. In response to these challenges, we propose a panoramic image inpainting framework that consists of a Face Generator, a Cube Generator, a side branch, and two discriminators. We use the Cubemap Projection (CMP) format as network input. The generator employs gated convolutions to distinguish valid pixels from invalid ones, while a side branch is designed utilizing contextual reconstruction (CR) loss to guide the generators to find the most suitable reference patch for inpainting the missing region. The proposed method is compared with state-of-the-art (SOTA) methods on SUN360 Street View dataset in terms of PSNR and SSIM. Experimental results and ablation study demonstrate that the proposed method outperforms SOTA both quantitatively and qualitatively.
Paper Structure (9 sections, 7 equations, 2 figures, 2 tables)

This paper contains 9 sections, 7 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Our proposed panoramic image inpainting framework takes a panoramic image in CMP format as input. It uses two generators (Face Generator and Cube Generator) with gated convolution to restore the corrupted images, where the side branch guides the generator to find the most suitable reference patches. Then, each face is fed into the Slice Discriminator to judge authenticity. Finally, the six faces are simultaneously fed into the Whole Discriminator for correlation evaluation.
  • Figure 2: Qualitative comparison of the proposed method with SOTA methods. Best viewed with zoom-in.