Table of Contents
Fetching ...

Noise-free Optimization in Early Training Steps for Image Super-Resolution

MinKyu Lee, Jae-Pil Heo

TL;DR

This work addresses the ill-posed nature of image super-resolution by showing that standard pixel-wise training optimizes toward a single ground truth and neglects the distribution of plausible high-resolution images. It introduces a probabilistic decomposition of HR images into an optimal centroid $\mu_{\text{true}}$ and inherent noise $\epsilon$, and derives a noise-free objective that targets the centroid; since the centroid is intractable, it uses an empirical centroid $\mu_{\text{emp}} = \hat f(x)$ from a pretrained network. The authors then propose Empirical Centroid-oriented Optimization (ECO), which employs a spatially consistent objective $\|\mu_{\text{emp}} - f(\downarrow(\mu_{\text{emp}}))\|$ and mitigates estimation error via a mixup-based scheduling that gradually reintroduces high-frequency supervision. Empirical results on standard SR benchmarks and real-world settings show ECO yields more stable early training, improved loss/gradient behavior, and consistent performance gains across architectures and losses. The approach offers a practical, mixable pathway to better exploit the ill-posedness of SR, enhancing both optimization stability and final reconstruction quality.

Abstract

Recent deep-learning-based single image super-resolution (SISR) methods have shown impressive performance whereas typical methods train their networks by minimizing the pixel-wise distance with respect to a given high-resolution (HR) image. However, despite the basic training scheme being the predominant choice, its use in the context of ill-posed inverse problems has not been thoroughly investigated. In this work, we aim to provide a better comprehension of the underlying constituent by decomposing target HR images into two subcomponents: (1) the optimal centroid which is the expectation over multiple potential HR images, and (2) the inherent noise defined as the residual between the HR image and the centroid. Our findings show that the current training scheme cannot capture the ill-posed nature of SISR and becomes vulnerable to the inherent noise term, especially during early training steps. To tackle this issue, we propose a novel optimization method that can effectively remove the inherent noise term in the early steps of vanilla training by estimating the optimal centroid and directly optimizing toward the estimation. Experimental results show that the proposed method can effectively enhance the stability of vanilla training, leading to overall performance gain. Codes are available at github.com/2minkyulee/ECO.

Noise-free Optimization in Early Training Steps for Image Super-Resolution

TL;DR

This work addresses the ill-posed nature of image super-resolution by showing that standard pixel-wise training optimizes toward a single ground truth and neglects the distribution of plausible high-resolution images. It introduces a probabilistic decomposition of HR images into an optimal centroid and inherent noise , and derives a noise-free objective that targets the centroid; since the centroid is intractable, it uses an empirical centroid from a pretrained network. The authors then propose Empirical Centroid-oriented Optimization (ECO), which employs a spatially consistent objective and mitigates estimation error via a mixup-based scheduling that gradually reintroduces high-frequency supervision. Empirical results on standard SR benchmarks and real-world settings show ECO yields more stable early training, improved loss/gradient behavior, and consistent performance gains across architectures and losses. The approach offers a practical, mixable pathway to better exploit the ill-posedness of SR, enhancing both optimization stability and final reconstruction quality.

Abstract

Recent deep-learning-based single image super-resolution (SISR) methods have shown impressive performance whereas typical methods train their networks by minimizing the pixel-wise distance with respect to a given high-resolution (HR) image. However, despite the basic training scheme being the predominant choice, its use in the context of ill-posed inverse problems has not been thoroughly investigated. In this work, we aim to provide a better comprehension of the underlying constituent by decomposing target HR images into two subcomponents: (1) the optimal centroid which is the expectation over multiple potential HR images, and (2) the inherent noise defined as the residual between the HR image and the centroid. Our findings show that the current training scheme cannot capture the ill-posed nature of SISR and becomes vulnerable to the inherent noise term, especially during early training steps. To tackle this issue, we propose a novel optimization method that can effectively remove the inherent noise term in the early steps of vanilla training by estimating the optimal centroid and directly optimizing toward the estimation. Experimental results show that the proposed method can effectively enhance the stability of vanilla training, leading to overall performance gain. Codes are available at github.com/2minkyulee/ECO.
Paper Structure (39 sections, 14 equations, 9 figures, 2 tables)

This paper contains 39 sections, 14 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Visualization of our method (ECO) compared to vanilla training and knowledge distillation (KD). Data points indicated in gray text are not available during training. Vanilla training leads to noisy training since it is unaware of the inherent noise $\epsilon$, which is defined as the difference of a given HR image $y^*$ and the expectation over all possible HR images, $\mu_{\text{true}}$. On the other hand, KD benefits from noise-free targets but suffers from spatial inconsistency between the input and target images as in Eq.\ref{['eq:construction_of_kd']}. The proposed objective Eq.\ref{['eq:train_with_sr']} benefits from noise-free training while being spatially aligned. Then, we overcome the limitations that arise by removing the estimation error term $\Delta\mu:=\mu_\text{true}-\mu_\text{emp}$ with a smooth transition from the proposed objective to the original objective. Remarkably, the overall solution can be greatly simplified with the use of mixup strategy as in Eq.\ref{['eq:final_w_mixup']} (Section \ref{['section:mixup_as_rescue']}). Starting from synthetic data pairs ($\alpha=0$), gradually migrate to real data pairs ($\alpha=1$). This way, we enjoy noise-free training during the early steps, and finetune the network with supervision from real data samples in later steps.
  • Figure 2: Visualization of maximum gradient difference and the loss variation. Spikes of gradient differences indicate that the gradients are not well-bounded (i.e., not Lipschitz).
  • Figure 3: Comparison of our method (w/o mixup) with KD and vanilla training on Set5. It verifies the impact of spatial inconsistency in training image pairs.
  • Figure 4: Validation results are reported for both vanilla training and the proposed method (without mixup) across mini-batch sizes of 2, 4, 8, and 16. The shaded regions indicate the minimum and maximum PSNR values at each iteration across all settings. Noise-free optimization enables additional stability throughout various batch-size choices.
  • Figure 5: Validation results over various configurations of mixup. Without mixup, the performance is limited due to neglecting the estimation error factor $\Delta\mu$ as in Eq.\ref{['eq:train_with_sr']}.
  • ...and 4 more figures