RetCompletion:High-Speed Inference Image Completion with Retentive Network

Yueyang Cang; Pingge Hu; Xiaoteng Zhang; Xingtong Wang; Yuhang Liu; Li Shi

RetCompletion:High-Speed Inference Image Completion with Retentive Network

Yueyang Cang, Pingge Hu, Xiaoteng Zhang, Xingtong Wang, Yuhang Liu, Li Shi

TL;DR

RetCompletion tackles the bottleneck of slow inference in pluralistic image completion by adapting RetNet to vision with a two-stage pipeline. It introduces Bi-RetNet to fuse bidirectional context for coherent low-resolution priors and employs pixel-wise inference to enable rapid updates, followed by CNN-based guided upsampling for texture. On ImageNet and CelebA-HQ, it delivers substantial speedups over ICT and RePaint while maintaining competitive quality, validated by quantitative metrics and a user study. This work broadens the applicability of RetNet in computer vision and offers a practical solution for real-time pluralistic inpainting.

Abstract

Time cost is a major challenge in achieving high-quality pluralistic image completion. Recently, the Retentive Network (RetNet) in natural language processing offers a novel approach to this problem with its low-cost inference capabilities. Inspired by this, we apply RetNet to the pluralistic image completion task in computer vision. We present RetCompletion, a two-stage framework. In the first stage, we introduce Bi-RetNet, a bidirectional sequence information fusion model that integrates contextual information from images. During inference, we employ a unidirectional pixel-wise update strategy to restore consistent image structures, achieving both high reconstruction quality and fast inference speed. In the second stage, we use a CNN for low-resolution upsampling to enhance texture details. Experiments on ImageNet and CelebA-HQ demonstrate that our inference speed is 10$\times$ faster than ICT and 15$\times$ faster than RePaint. The proposed RetCompletion significantly improves inference speed and delivers strong performance.

RetCompletion:High-Speed Inference Image Completion with Retentive Network

TL;DR

Abstract

faster than ICT and 15

faster than RePaint. The proposed RetCompletion significantly improves inference speed and delivers strong performance.

Paper Structure (22 sections, 10 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 10 equations, 3 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Pluralistic Image Completion
Retentive Network
Methods
Preprocessing
Feature Encoding
Position Encoding
Appearance Priors Reconstruction by Bi-RetNet
Multi-Head Forward-RetNet
Multi-Head Backward-RetNet
Feature Fusion
Loss Function
Parallel Training
Pixel-wise Inference
...and 7 more sections

Figures (3)

Figure 1: Pipeline Overview. Our method consists of two networks, which are trained separately. Based on the Bi-RetNet, the first network is employed for completing low-dimensional images. A parallel representation is utilized during training, predicting all pixels simultaneously to expedite the training process. In contrast, during inference, a recurrent representation is employed, predicting one pixel at a time to enhance the quality of the generated image. The second network, built on a CNN architecture, comprises an encoder, a decoder, and multiple residual blocks. Its primary function is to restore high-dimensional images from their low-dimensional counterparts.
Figure 2: Sample images for user study.
Figure 3: Comparison of user study results and inference time.

RetCompletion:High-Speed Inference Image Completion with Retentive Network

TL;DR

Abstract

RetCompletion:High-Speed Inference Image Completion with Retentive Network

Authors

TL;DR

Abstract

Table of Contents

Figures (3)