Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration

Jingyun Xue; Tao Wang; Pengwen Dai; Kaihao Zhang

Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration

Jingyun Xue, Tao Wang, Pengwen Dai, Kaihao Zhang

TL;DR

This paper tackles the challenging problem of restoring Under-Display Camera (UDC) images degraded by diffraction, color shifts, and low light. It introduces SGSFormer, an asymmetric U‑Net style Transformer that uses Segmentation Guided Sparse Attention to filter out noise and focus on regions needing reconstruction, with segmentation priors from SAM guiding the attention in the sparse transformer blocks. Key contributions include the Segmentation Guided Sparse Attention mechanism, the lightweight encoder variant, and a Mixed Gated-Dconv FFN, along with extensive ablations on the P-OLED and T-OLED datasets showing improved perceptual metrics and competitive pixel-wise restoration. The approach advances UDC restoration by combining global contextual reasoning with targeted local refinement, enabling higher perceptual quality and more accurate color recovery, which is crucial for practical full-screen displays on mobile devices.

Abstract

Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing UDC image restoration methods predominantly utilize convolutional neural network architectures, whereas Transformer-based methods have exhibited superior performance in the majority of image restoration tasks. This is attributed to the Transformer's capability to sample global features for the local reconstruction of images, thereby achieving high-quality image restoration. In this paper, we observe that when using the Vision Transformer for UDC degraded image restoration, the global attention samples a large amount of redundant information and noise. Furthermore, compared to the ordinary Transformer employing dense attention, the Transformer utilizing sparse attention can alleviate the adverse impact of redundant information and noise. Building upon this discovery, we propose a Segmentation Guided Sparse Transformer method (SGSFormer) for the task of restoring high-quality images from UDC degraded images. Specifically, we utilize sparse self-attention to filter out redundant information and noise, directing the model's attention to focus on the features more relevant to the degraded regions in need of reconstruction. Moreover, we integrate the instance segmentation map as prior information to guide the sparse self-attention in filtering and focusing on the correct regions.

Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration

TL;DR

Abstract

Paper Structure (22 sections, 20 equations, 10 figures, 4 tables)

This paper contains 22 sections, 20 equations, 10 figures, 4 tables.

Introduction
Related Work
UDC Image Degeneration
UDC Image Restoration
Vision Transformer
Sparse Transformer
Methodology
Overall Pipeline
Instance Segmentation Feature Extraction Module
Decoder Block
Encoder Block
Loss Function
Experiment
Implementation Details
Comparisons with SOTA Methods
...and 7 more sections

Figures (10)

Figure 1: The structural diagrams of regular OLED screens and two different forms of UDC screens, with the incident light direction indicated on the left, and the Light-Emitting direction on the right. (a) The fundamental structure of OLED screen panel. (b) The screen panel structure diagram of UDC is achieved by reducing the pixel density within the screen above the camera. (c) The screen panel structure diagram of UDC is achieved by using smaller pixel sizes within the screen above the camera.
Figure 2: The illustration of the UDC imaging system for P-OLED dataset and T-OLED dataset. In the optical system of P-OLED, sub-pixels are densely arranged on a yellow substrate, leading to low transmittance. Consequently, the imaging exhibits significant color deviation and low light degradation. In the optical system of T-OLED, the sub-pixels are sparsely arranged and there is no substrate, resulting in a high transmittance. The imaging is blurred caused by diffraction.
Figure 3: Illustration of the various attention receptive fields. (a) Dense attention uniformly perceives each region of the global context. (b) Sparse attention selectively perceives regions with higher relevance. (c) Segmentation Guided Sparse Attention can better judge which regions have the higher correlation and then perceive the regions with the highest correlation. (d) Decoders employ Segmentation Guided Sparse Attention and dense attention alternately, thereby possessing a more comprehensive receptive field.
Figure 4: Overview of the SGSFormer, which is an encoder-decoder architecture with residual connections. There are four encoder layers with down-sampling, four decoder layers with up-sampling, and a layer of latent block, represented by blue, green, and red squares respectively. The yellow square represents the Segmentation Feature Extraction Module, which provides different scales of segmentation features to each encoder and decoder.
Figure 5: The components of instance segmentation feature extraction module. The upper part is a fine-tuned instance segmentation network that outputs a segmentation map. The lower part is the feature transform layer, which transforms the segmentation map into multi-scale feature matrices and feeds them into the backbone network.
...and 5 more figures

Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration

TL;DR

Abstract

Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration

Authors

TL;DR

Abstract

Table of Contents

Figures (10)