HI-GAN: Hierarchical Inpainting GAN with Auxiliary Inputs for Combined RGB and Depth Inpainting
Ankan Dash, Jingyi Gu, Guiling Wang
TL;DR
This work tackles RGBD inpainting for mixed reality, where ToF depth maps often contain missing pixels and misalignments with RGB data. It introduces HI-GAN, an end-to-end framework that jointly trains three GANs—EdgeGAN, LabelGAN, and CombinedRGBD-GAN—where EdgeGAN and LabelGAN act as auxiliary-input regularizers whose latent encodings are fused into the main RGBD inpainting network. A hierarchical optimization scheme further refines the regularizers within CombinedRGBD-GAN, and the authors demonstrate that incorporating edge and especially segmentation label information yields significant gains in both RGB and depth restoration without resorting to attention mechanisms. Evaluations on SUN-RGBD with irregular masks show HI-GAN achieving higher perceptual quality (e.g., SSIM, PSNR) and lower reconstruction errors than baselines like PConv and EdgeConnect, validating the effectiveness of object-aware and boundary-guided RGBD inpainting for real-world MR/DR applications. The proposed approach offers a practical, end-to-end solution that leverages readily available RGBD data to improve scene plausibility in mixed reality scenarios.
Abstract
Inpainting involves filling in missing pixels or areas in an image, a crucial technique employed in Mixed Reality environments for various applications, particularly in Diminished Reality (DR) where content is removed from a user's visual environment. Existing methods rely on digital replacement techniques which necessitate multiple cameras and incur high costs. AR devices and smartphones use ToF depth sensors to capture scene depth maps aligned with RGB images. Despite speed and affordability, ToF cameras create imperfect depth maps with missing pixels. To address the above challenges, we propose Hierarchical Inpainting GAN (HI-GAN), a novel approach comprising three GANs in a hierarchical fashion for RGBD inpainting. EdgeGAN and LabelGAN inpaint masked edge and segmentation label images respectively, while CombinedRGBD-GAN combines their latent representation outputs and performs RGB and Depth inpainting. Edge images and particularly segmentation label images as auxiliary inputs significantly enhance inpainting performance by complementary context and hierarchical optimization. We believe we make the first attempt to incorporate label images into inpainting process.Unlike previous approaches requiring multiple sequential models and separate outputs, our work operates in an end-to-end manner, training all three models simultaneously and hierarchically. Specifically, EdgeGAN and LabelGAN are first optimized separately and further optimized inside CombinedRGBD-GAN to enhance inpainting quality. Experiments demonstrate that HI-GAN works seamlessly and achieves overall superior performance compared with existing approaches.
