Table of Contents
Fetching ...

Beyond the Ground Truth: Enhanced Supervision for Image Restoration

Donghun Ryou, Inju Ha, Sanghyeok Chu, Bohyung Han

TL;DR

The paper tackles the bottleneck of suboptimal real-world ground truth in image restoration by introducing a two-stage supervision enhancement: perceptual GT improvement via one-step diffusion super-resolution and a frequency-domain mixup guided by an adaptive mask generator. This enhanced GT is fused with the original GT to produce a superior supervisory target, which trains a lightweight Output Refinement Network (ORNet) that can be attached to any pretrained restoration model. Across GoPro deblurring and SIDD denoising, the approach yields consistent perceptual gains, improves robustness to unseen degradations, and is validated by quantitative metrics, qualitative results, and user studies. The framework is model-agnostic, computationally efficient, and offers a practical path to higher-fidelity restoration in real-world scenarios.

Abstract

Deep learning-based image restoration has achieved significant success. However, when addressing real-world degradations, model performance is limited by the quality of ground-truth images in datasets due to practical constraints in data acquisition. To address this limitation, we propose a novel framework that enhances existing ground truth images to provide higher-quality supervision for real-world restoration. Our framework generates perceptually enhanced ground truth images using super-resolution by incorporating adaptive frequency masks, which are learned by a conditional frequency mask generator. These masks guide the optimal fusion of frequency components from the original ground truth and its super-resolved variants, yielding enhanced ground truth images. This frequency-domain mixup preserves the semantic consistency of the original content while selectively enriching perceptual details, preventing hallucinated artifacts that could compromise fidelity. The enhanced ground truth images are used to train a lightweight output refinement network that can be seamlessly integrated with existing restoration models. Extensive experiments demonstrate that our approach consistently improves the quality of restored images. We further validate the effectiveness of both supervision enhancement and output refinement through user studies. Code is available at https://github.com/dhryougit/Beyond-the-Ground-Truth.

Beyond the Ground Truth: Enhanced Supervision for Image Restoration

TL;DR

The paper tackles the bottleneck of suboptimal real-world ground truth in image restoration by introducing a two-stage supervision enhancement: perceptual GT improvement via one-step diffusion super-resolution and a frequency-domain mixup guided by an adaptive mask generator. This enhanced GT is fused with the original GT to produce a superior supervisory target, which trains a lightweight Output Refinement Network (ORNet) that can be attached to any pretrained restoration model. Across GoPro deblurring and SIDD denoising, the approach yields consistent perceptual gains, improves robustness to unseen degradations, and is validated by quantitative metrics, qualitative results, and user studies. The framework is model-agnostic, computationally efficient, and offers a practical path to higher-fidelity restoration in real-world scenarios.

Abstract

Deep learning-based image restoration has achieved significant success. However, when addressing real-world degradations, model performance is limited by the quality of ground-truth images in datasets due to practical constraints in data acquisition. To address this limitation, we propose a novel framework that enhances existing ground truth images to provide higher-quality supervision for real-world restoration. Our framework generates perceptually enhanced ground truth images using super-resolution by incorporating adaptive frequency masks, which are learned by a conditional frequency mask generator. These masks guide the optimal fusion of frequency components from the original ground truth and its super-resolved variants, yielding enhanced ground truth images. This frequency-domain mixup preserves the semantic consistency of the original content while selectively enriching perceptual details, preventing hallucinated artifacts that could compromise fidelity. The enhanced ground truth images are used to train a lightweight output refinement network that can be seamlessly integrated with existing restoration models. Extensive experiments demonstrate that our approach consistently improves the quality of restored images. We further validate the effectiveness of both supervision enhancement and output refinement through user studies. Code is available at https://github.com/dhryougit/Beyond-the-Ground-Truth.

Paper Structure

This paper contains 43 sections, 10 equations, 19 figures, 12 tables.

Figures (19)

  • Figure 1: Visualization of our enhanced ground truth (GT). Our enhanced GT not only demonstrates sharper text and superior perceptual quality but also maintains semantic consistency with respect to the original GT. Zoom in for better visualizaiton.
  • Figure 2:
  • Figure 3: Qualitative comparison of state-of-the-art deblurring methods, including ours (ORNet applied to FFTformer), on the GoPro dataset. Our method significantly improves the visual quality of the deblurred image. Zoom in for better visualization.
  • Figure 4: Qualitative comparison of state-of-the-art denoising methods, including ours (ORNet applied to NAFNet), on the SIDD dataset. Our method significantly improves the visual quality of the denoised image. Zoom in for better visualization.
  • Figure 5: The top row shows the results when the conditional frequency mask generator is trained using our method. The second row shows the results when it is trained in an element-wise manner without ring-shaped Gaussian basis in frequency domain. The bottom row shows the results when it is trained in an element-wise manner in spatial domain. $M_i$ denotes the generated masks, and $\hat{I}^{\text{GT}}$ represents the enhanced ground truth generated using these masks. Zoom in for better visualization.
  • ...and 14 more figures