Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration
Jingyun Xue, Tao Wang, Pengwen Dai, Kaihao Zhang
TL;DR
This paper tackles the challenging problem of restoring Under-Display Camera (UDC) images degraded by diffraction, color shifts, and low light. It introduces SGSFormer, an asymmetric U‑Net style Transformer that uses Segmentation Guided Sparse Attention to filter out noise and focus on regions needing reconstruction, with segmentation priors from SAM guiding the attention in the sparse transformer blocks. Key contributions include the Segmentation Guided Sparse Attention mechanism, the lightweight encoder variant, and a Mixed Gated-Dconv FFN, along with extensive ablations on the P-OLED and T-OLED datasets showing improved perceptual metrics and competitive pixel-wise restoration. The approach advances UDC restoration by combining global contextual reasoning with targeted local refinement, enabling higher perceptual quality and more accurate color recovery, which is crucial for practical full-screen displays on mobile devices.
Abstract
Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing UDC image restoration methods predominantly utilize convolutional neural network architectures, whereas Transformer-based methods have exhibited superior performance in the majority of image restoration tasks. This is attributed to the Transformer's capability to sample global features for the local reconstruction of images, thereby achieving high-quality image restoration. In this paper, we observe that when using the Vision Transformer for UDC degraded image restoration, the global attention samples a large amount of redundant information and noise. Furthermore, compared to the ordinary Transformer employing dense attention, the Transformer utilizing sparse attention can alleviate the adverse impact of redundant information and noise. Building upon this discovery, we propose a Segmentation Guided Sparse Transformer method (SGSFormer) for the task of restoring high-quality images from UDC degraded images. Specifically, we utilize sparse self-attention to filter out redundant information and noise, directing the model's attention to focus on the features more relevant to the degraded regions in need of reconstruction. Moreover, we integrate the instance segmentation map as prior information to guide the sparse self-attention in filtering and focusing on the correct regions.
