Table of Contents
Fetching ...

RETHINED: A New Benchmark and Baseline for Real-Time High-Resolution Image Inpainting On Edge Devices

Marcelo Sanchez, Gil Triginer, Ignacio Sarasua, Lara Raad, Coloma Ballester

TL;DR

Real-time high-resolution image inpainting on edge devices is challenging due to memory and latency constraints. The authors propose RETHINED, a lightweight pipeline that combines a CNN-based coarse restoration with a NeuralPatchMatch texture refinement and an attention-guided upscaling step to generate HR inpainting, aided by model re-parameterization for latency reduction. Key contributions include the first real-time HR on-edge baseline, the NeuralPatchMatch mechanism with an attention transfer module, and the DF8K-Inpainting dataset for free-form HR masks. The work demonstrates up to 100x speedups over prior mobile methods while maintaining competitive LR quality and superior HR detail, enabling practical deployment of HR inpainting on diverse edge devices and providing a new benchmark for future research.

Abstract

Existing image inpainting methods have shown impressive completion results for low-resolution images. However, most of these algorithms fail at high resolutions and require powerful hardware, limiting their deployment on edge devices. Motivated by this, we propose the first baseline for REal-Time High-resolution image INpainting on Edge Devices (RETHINED) that is able to inpaint at ultra-high-resolution and can run in real-time ($\leq$ 30ms) in a wide variety of mobile devices. A simple, yet effective novel method formed by a lightweight Convolutional Neural Network (CNN) to recover structure, followed by a resolution-agnostic patch replacement mechanism to provide detailed texture. Specially our pipeline leverages the structural capacity of CNN and the high-level detail of patch-based methods, which is a key component for high-resolution image inpainting. To demonstrate the real application of our method, we conduct an extensive analysis on various mobile-friendly devices and demonstrate similar inpainting performance while being $\mathrm{100 \times faster}$ than existing state-of-the-art methods. Furthemore, we realease DF8K-Inpainting, the first free-form mask UHD inpainting dataset.

RETHINED: A New Benchmark and Baseline for Real-Time High-Resolution Image Inpainting On Edge Devices

TL;DR

Real-time high-resolution image inpainting on edge devices is challenging due to memory and latency constraints. The authors propose RETHINED, a lightweight pipeline that combines a CNN-based coarse restoration with a NeuralPatchMatch texture refinement and an attention-guided upscaling step to generate HR inpainting, aided by model re-parameterization for latency reduction. Key contributions include the first real-time HR on-edge baseline, the NeuralPatchMatch mechanism with an attention transfer module, and the DF8K-Inpainting dataset for free-form HR masks. The work demonstrates up to 100x speedups over prior mobile methods while maintaining competitive LR quality and superior HR detail, enabling practical deployment of HR inpainting on diverse edge devices and providing a new benchmark for future research.

Abstract

Existing image inpainting methods have shown impressive completion results for low-resolution images. However, most of these algorithms fail at high resolutions and require powerful hardware, limiting their deployment on edge devices. Motivated by this, we propose the first baseline for REal-Time High-resolution image INpainting on Edge Devices (RETHINED) that is able to inpaint at ultra-high-resolution and can run in real-time ( 30ms) in a wide variety of mobile devices. A simple, yet effective novel method formed by a lightweight Convolutional Neural Network (CNN) to recover structure, followed by a resolution-agnostic patch replacement mechanism to provide detailed texture. Specially our pipeline leverages the structural capacity of CNN and the high-level detail of patch-based methods, which is a key component for high-resolution image inpainting. To demonstrate the real application of our method, we conduct an extensive analysis on various mobile-friendly devices and demonstrate similar inpainting performance while being than existing state-of-the-art methods. Furthemore, we realease DF8K-Inpainting, the first free-form mask UHD inpainting dataset.

Paper Structure

This paper contains 21 sections, 4 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Left: Inpainting result on ultra high-resolution images (best viewed by zoom-in on screen). Right: Comparison of LPIPS performance and Latency among different state-of-the-art methods.
  • Figure 2: Proposed Inpainting Pipeline. Given a HR image $\mathbf{y}$ and a binary mask $\mathbf{m}$ with corrupted pixels as inputs (left), our model first downsamples $\mathbf{x} = \mathbf{y} \odot \mathbf{m}$ to a lower resolution $\mathbf{x}_{LR}$, and forwards it to the coarse model $f_{\theta}$ obtaining $\hat{\mathbf{x}}_{\text{coarse}}$. It is then refined by the NeuralPatchMatch module obtaining $\hat{\mathbf{x}}_{\text{LR}}$ and the attention map $\mathbf{A}$. From $\mathbf{A}$ and $\mathbf{x}$, our Attention Upscaling module yields $\hat{\mathbf{x}}_{\text{HR}}$.
  • Figure 3: Comparison of different inpainting methods able to work on mobile devices. Latency speed appears in parentheses and has been calculated at $2048 \times 2048$ resolution on Apple M2 Ipad Pro.
  • Figure 4: Proposed NeuralPatchMatch Inpainting Module. (Corrupted patches are displayed as red while uncorrupted ones as green .) First, we project patch embedding to embedding space of dimension $d_{k}$ (Sect. \ref{['sec:neural_patch_match']}). Then token similarity is computed in a self-attention manner, obtaining attention map $\mathbf{A}$ (where lighter colors correspond to a large softmax value while darker colors correspond to a low value). The self-attention masking allows to inpaint only on corrupted regions, maintaining high-frequency details from uncorrupted zones. To obtain the final inpainted image, we mix the tokens via a weighted sum based on the attention map $\mathbf{A}$.
  • Figure 5: 15x zoomed Inpainting results of our proposed method at different higher resolutions. Our method is able to correctly inpaint images at any given resultion, making it suitable for real-world applications.