Table of Contents
Fetching ...

Mask Guided Gated Convolution for Amodal Content Completion

Kaziwa Saleh, Sándor Szénási, Zoltán Vámossy

TL;DR

This work tackles amodal content completion by guiding gated convolutional networks with a weighted mask that emphasizes the visible portion of an occluded object. A coarse-to-refinement architecture with contextual attention and a SN-PatchGAN discriminator synthesizes the hidden content under a self-supervised training scheme on COCOA. Empirical results show improved texture richness and semantic consistency over baselines, especially for uniformly textured occlusions, with ablations highlighting the importance of perceptual and patch losses. The approach advances scene understanding in cluttered environments and has potential benefits for robotics, autonomous systems, and safety applications, while struggling on objects with highly disparate parts. Overall, the paper contributes a practical weighted-mask mechanism and a robust training pipeline for amodal reconstruction.

Abstract

We present a model to reconstruct partially visible objects. The model takes a mask as an input, which we call weighted mask. The mask is utilized by gated convolutions to assign more weight to the visible pixels of the occluded instance compared to the background, while ignoring the features of the invisible pixels. By drawing more attention from the visible region, our model can predict the invisible patch more effectively than the baseline models, especially in instances with uniform texture. The model is trained on COCOA dataset and two subsets of it in a self-supervised manner. The results demonstrate that our model generates higher quality and more texture-rich outputs compared to baseline models. Code is available at: https://github.com/KaziwaSaleh/mask-guided.

Mask Guided Gated Convolution for Amodal Content Completion

TL;DR

This work tackles amodal content completion by guiding gated convolutional networks with a weighted mask that emphasizes the visible portion of an occluded object. A coarse-to-refinement architecture with contextual attention and a SN-PatchGAN discriminator synthesizes the hidden content under a self-supervised training scheme on COCOA. Empirical results show improved texture richness and semantic consistency over baselines, especially for uniformly textured occlusions, with ablations highlighting the importance of perceptual and patch losses. The approach advances scene understanding in cluttered environments and has potential benefits for robotics, autonomous systems, and safety applications, while struggling on objects with highly disparate parts. Overall, the paper contributes a practical weighted-mask mechanism and a robust training pipeline for amodal reconstruction.

Abstract

We present a model to reconstruct partially visible objects. The model takes a mask as an input, which we call weighted mask. The mask is utilized by gated convolutions to assign more weight to the visible pixels of the occluded instance compared to the background, while ignoring the features of the invisible pixels. By drawing more attention from the visible region, our model can predict the invisible patch more effectively than the baseline models, especially in instances with uniform texture. The model is trained on COCOA dataset and two subsets of it in a self-supervised manner. The results demonstrate that our model generates higher quality and more texture-rich outputs compared to baseline models. Code is available at: https://github.com/KaziwaSaleh/mask-guided.
Paper Structure (9 sections, 9 equations, 4 figures, 1 table)

This paper contains 9 sections, 9 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Gated convolution as used in DeepFill yu2019free and ours with the weighted mask.
  • Figure 2: Our completion model with gated convolution and SN-PatchGAN as used in DeepFill yu2019free with an additional weighted mask.
  • Figure 3: Qualitative comparison of completed images from COCOA-animal-M validation set. The weighted mask is used in our model, DeepFill and PCNet-C only utilize the mask of the missing region (indicated by the white pixels). (d) and (e) show results of the published version of DeepFill and the re-trained version on the used datasets, respectively.
  • Figure 4: Ablation study of our model without each loss component.