Integrating Spatial and Frequency Information for Under-Display Camera Image Restoration
Kyusu Ahn, Jinpyo Kim, Chanwoo Park, JiSoo Kim, Jaejin Lee
TL;DR
This work tackles the challenging problem of restoring images captured with under-display cameras, where degradations include noise, blur, reduced transmittance, and especially global flares. It introduces SFIM, a four-level architecture that combines CNN-based spatial processing (level 1) with FFT-based global modeling (levels 2–4), unified by an attention-driven multi-level integration block (AMIB). The approach leverages both local and global information, with a loss that combines spatial and frequency-domain terms to guide restoration. Empirical results on three UDC benchmarks demonstrate state-of-the-art performance, with strong improvements in flare suppression and texture fidelity, highlighting the value of explicitly integrating spatial and frequency information for broad, real-world degradations.
Abstract
Under-Display Camera (UDC) houses a digital camera lens under a display panel. However, UDC introduces complex degradations such as noise, blur, decrease in transmittance, and flare. Despite the remarkable progress, previous research on UDC mainly focuses on eliminating diffraction in the spatial domain and rarely explores its potential in the frequency domain. It is essential to consider both the spatial and frequency domains effectively. For example, degradations, such as noise and blur, can be addressed by local information (e.g., CNN kernels in the spatial domain). At the same time, tackling flares may require leveraging global information (e.g., the frequency domain). In this paper, we revisit the UDC degradations in the Fourier space and figure out intrinsic frequency priors that imply the presence of the flares. Based on this observation, we propose a novel multi-level DNN architecture called SFIM. It efficiently restores UDC-distorted images by integrating local and global (the collective contribution of all points in the image) information. The architecture exploits CNNs to capture local information and FFT-based models to capture global information. SFIM comprises a spatial domain block (SDB), a Frequency Domain Block (FDB), and an Attention-based Multi-level Integration Block (AMIB). Specifically, SDB focuses more on detailed textures such as noise and blur, FDB emphasizes irregular texture loss in extensive areas such as flare, and AMIB enables effective cross-domain interaction. SFIM's superior performance over state-of-the-art approaches is demonstrated through rigorous quantitative and qualitative assessments across three UDC benchmarks.
