Table of Contents
Fetching ...

RAW-Domain Degradation Models for Realistic Smartphone Super-Resolution

Ali Mosleh, Faraz Ali, Fengjia Zhang, Stavros Tsogkas, Junyong Lee, Alex Levinshtein, Michael S. Brown

Abstract

Digital zoom on smartphones relies on learning-based super-resolution (SR) models that operate on RAW sensor images, but obtaining sensor-specific training data is challenging due to the lack of ground-truth images. Synthetic data generation via ``unprocessing'' pipelines offers a potential solution by simulating the degradations that transform high-resolution (HR) images into their low-resolution (LR) counterparts. However, these pipelines can introduce domain gaps due to incomplete or unrealistic degradation modeling. In this paper, we demonstrate that principled and carefully designed degradation modeling can enhance SR performance in real-world conditions. Instead of relying on generic priors for camera blur and noise, we model device-specific degradations through calibration and unprocess publicly available rendered images into the RAW domain of different smartphones. Using these image pairs, we train a single-image RAW-to-RGB SR model and evaluate it on real data from a held-out device. Our experiments show that accurate degradation modeling leads to noticeable improvements, with our SR model outperforming baselines trained on large pools of arbitrarily chosen degradations.

RAW-Domain Degradation Models for Realistic Smartphone Super-Resolution

Abstract

Digital zoom on smartphones relies on learning-based super-resolution (SR) models that operate on RAW sensor images, but obtaining sensor-specific training data is challenging due to the lack of ground-truth images. Synthetic data generation via ``unprocessing'' pipelines offers a potential solution by simulating the degradations that transform high-resolution (HR) images into their low-resolution (LR) counterparts. However, these pipelines can introduce domain gaps due to incomplete or unrealistic degradation modeling. In this paper, we demonstrate that principled and carefully designed degradation modeling can enhance SR performance in real-world conditions. Instead of relying on generic priors for camera blur and noise, we model device-specific degradations through calibration and unprocess publicly available rendered images into the RAW domain of different smartphones. Using these image pairs, we train a single-image RAW-to-RGB SR model and evaluate it on real data from a held-out device. Our experiments show that accurate degradation modeling leads to noticeable improvements, with our SR model outperforming baselines trained on large pools of arbitrarily chosen degradations.
Paper Structure (30 sections, 6 equations, 13 figures, 4 tables)

This paper contains 30 sections, 6 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Our SR approach incorporates realistic SR kernels and sensor noise functions, accurately modeled for various modern smartphones. A RAW image captured with Pixel 6 Main camera is processed using a 4$\times$ SR model trained on data generated with our realistic degradations. Our model recovers more details and structures compared to other baselines. Pixel 6's specific degradations are not seen by the models during training.
  • Figure 2: Examples of calibration patterns and images used to approximate SR kernels. Gray-code patterns and color/gray patches displayed on a monitor, along with their RAW capturing with the target camera, are used for geometric and radiometric alignment and to form HR target images. A pair of $\mathbf{x}$ and $\mathbf{y}$ used in \ref{['eq:kernel_modeling']} for 4$\times$ SR kernel estimation of Pixel 9 Pro Tele is illustrated.
  • Figure 3: Noise model calibration results for the S23U Main camera. Noise calibration is performed per color channel of the CFA at seven different ISO levels shown for the first green channel here. A curve is then fit to each noise parameter, enabling interpolation of noise variance for uncalibrated ISO levels.
  • Figure 4: Camera-specific SR results on data from four devices (from top to bottom: S23U Tele 1, S24U Tele 2, Pixel 9 Pro Main, and Pixel 9 Pro Tele). We compare the outputs of a RAW-to-RGB 4$\times$ SR model trained on data synthesized with different degradation baselines and our own calibrated, camera-specific degradations. The model trained with our data produces sharper results, and recovers more structural details. For better visualization, the original LR capture is linearly demosaicked and naively upsampled by 4$\times$. All images are white-balanced in each row using similar gains and gamma correction to ensure a consistent visual comparison. The yellow dashed rectangle corresponds to the FOV of interest w.r.t 4$\times$ zoom factor.
  • Figure 5: Cross-camera SR results on Mi 11 Main real RAW captures. We compare the outputs of a RAW-to-RGB $4\times$ SR model, trained on data generated with different degradation modeling approaches. The degradations of the test camera are not seen by any of the models during training, yet the model trained with our synthetic data is better at recovering fine details such as building structures and text. The original LR capture is linearly demosaicked, and naive $4\times$ upsampling is applied for visualization. All images are white-balanced with similar gains, and gamma corrected. The yellow dashed rectangle represents the FOV of interest relative to the $4\times$ zoom factor.
  • ...and 8 more figures