Noise-free Optimization in Early Training Steps for Image Super-Resolution
MinKyu Lee, Jae-Pil Heo
TL;DR
This work addresses the ill-posed nature of image super-resolution by showing that standard pixel-wise training optimizes toward a single ground truth and neglects the distribution of plausible high-resolution images. It introduces a probabilistic decomposition of HR images into an optimal centroid $\mu_{\text{true}}$ and inherent noise $\epsilon$, and derives a noise-free objective that targets the centroid; since the centroid is intractable, it uses an empirical centroid $\mu_{\text{emp}} = \hat f(x)$ from a pretrained network. The authors then propose Empirical Centroid-oriented Optimization (ECO), which employs a spatially consistent objective $\|\mu_{\text{emp}} - f(\downarrow(\mu_{\text{emp}}))\|$ and mitigates estimation error via a mixup-based scheduling that gradually reintroduces high-frequency supervision. Empirical results on standard SR benchmarks and real-world settings show ECO yields more stable early training, improved loss/gradient behavior, and consistent performance gains across architectures and losses. The approach offers a practical, mixable pathway to better exploit the ill-posedness of SR, enhancing both optimization stability and final reconstruction quality.
Abstract
Recent deep-learning-based single image super-resolution (SISR) methods have shown impressive performance whereas typical methods train their networks by minimizing the pixel-wise distance with respect to a given high-resolution (HR) image. However, despite the basic training scheme being the predominant choice, its use in the context of ill-posed inverse problems has not been thoroughly investigated. In this work, we aim to provide a better comprehension of the underlying constituent by decomposing target HR images into two subcomponents: (1) the optimal centroid which is the expectation over multiple potential HR images, and (2) the inherent noise defined as the residual between the HR image and the centroid. Our findings show that the current training scheme cannot capture the ill-posed nature of SISR and becomes vulnerable to the inherent noise term, especially during early training steps. To tackle this issue, we propose a novel optimization method that can effectively remove the inherent noise term in the early steps of vanilla training by estimating the optimal centroid and directly optimizing toward the estimation. Experimental results show that the proposed method can effectively enhance the stability of vanilla training, leading to overall performance gain. Codes are available at github.com/2minkyulee/ECO.
