TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting
Taorong Liu, Liang Liao, Delin Chen, Jing Xiao, Zheng Wang, Chia-Wen Lin, Shin'ichi Satoh
TL;DR
This paper tackles reference-guided image inpainting for large, irregular holes by introducing TransRef, a multi-scale transformer framework that progressively embeds reference information through Ref-PA (patch alignment and harmonization) and Ref-PT (reference patch transformer) to cohere reference guidance with corrupted content. It integrates a hierarchical encoder-decoder with a convolution tail, and relies on a joint loss comprising $\mathcal{L}_1$, perceptual, and style terms to ensure pixel accuracy and perceptual quality. To support research in this area, the authors introduce DPED50K, a large open benchmark of 50K input-reference pairs for training and 2K for testing, derived via SIFT matching from real-world scenes. Experiments show that TransRef outperforms state-of-the-art methods across standard metrics, especially for large holes, and demonstrate promising applicability to object removal and cloud removal in remote sensing. The work advances reference-guided restoration by providing a scalable transformer-based solution and a valuable dataset to foster further development.
Abstract
Image inpainting for completing complicated semantic environments and diverse hole patterns of corrupted images is challenging even for state-of-the-art learning-based inpainting methods trained on large-scale data. A reference image capturing the same scene of a corrupted image offers informative guidance for completing the corrupted image as it shares similar texture and structure priors to that of the holes of the corrupted image. In this work, we propose a transformer-based encoder-decoder network, named TransRef, for reference-guided image inpainting. Specifically, the guidance is conducted progressively through a reference embedding procedure, in which the referencing features are subsequently aligned and fused with the features of the corrupted image. For precise utilization of the reference features for guidance, a reference-patch alignment (Ref-PA) module is proposed to align the patch features of the reference and corrupted images and harmonize their style differences, while a reference-patch transformer (Ref-PT) module is proposed to refine the embedded reference feature. Moreover, to facilitate the research of reference-guided image restoration tasks, we construct a publicly accessible benchmark dataset containing 50K pairs of input and reference images. Both quantitative and qualitative evaluations demonstrate the efficacy of the reference information and the proposed method over the state-of-the-art methods in completing complex holes. Code and dataset can be accessed at https://github.com/Cameltr/TransRef.
