Table of Contents
Fetching ...

Learning from History: Task-agnostic Model Contrastive Learning for Image Restoration

Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu

TL;DR

This paper tackles the problem of ill-posed image restoration by proposing a task-agnostic model-contrastive learning framework (MCLIR) that generates negative samples from a latency model of the target network. The core idea is to train the current model $f_\theta$ with negatives $f_{\theta'}$ that evolve as an exponential moving average of the parameters, enabling adaptive curriculum-like hard negatives. A Self-Prior Guided Negative Loss (SPN) operates in a perceptual embedding space (via $f^{Rec}=\text{VGG}(I^{Rec})$ and $f^{Neg}=\text{VGG}(I^{Neg})$) with $\mathcal{L}_{neg}=\|f^{Rec}-f^{Neg}\|_1$, and the total objective is $\mathcal{L}=\mathcal{L}_{rec}-\lambda \mathcal{L}^{N}_{neg}$, optionally using multiple negatives $N$ to strengthen regularization. Empirically, retraining diverse models for SR, dehazing, deblurring, and deraining yields substantial gains across datasets and architectures, with notable improvements such as up to $3.41$ dB in indoor dehazing and improved PSNR/SSIM on multiple benchmarks. The results suggest that SPN-based model contrastive learning offers a versatile, generalizable enhancement to low-level vision tasks, reducing reliance on task-specific negative design and enabling broader applicability.

Abstract

Contrastive learning has emerged as a prevailing paradigm for high-level vision tasks, which, by introducing properly negative samples, has also been exploited for low-level vision tasks to achieve a compact optimization space to account for their ill-posed nature. However, existing methods rely on manually predefined and task-oriented negatives, which often exhibit pronounced task-specific biases. To address this challenge, our paper introduces an innovative method termed 'learning from history', which dynamically generates negative samples from the target model itself. Our approach, named Model Contrastive Learning for Image Restoration (MCLIR), rejuvenates latency models as negative models, making it compatible with diverse image restoration tasks. We propose the Self-Prior guided Negative loss (SPN) to enable it. This approach significantly enhances existing models when retrained with the proposed model contrastive paradigm. The results show significant improvements in image restoration across various tasks and architectures. For example, models retrained with SPN outperform the original FFANet and DehazeFormer by 3.41 dB and 0.57 dB on the RESIDE indoor dataset for image dehazing. Similarly, they achieve notable improvements of 0.47 dB on SPA-Data over IDT for image deraining and 0.12 dB on Manga109 for a 4x scale super-resolution over lightweight SwinIR, respectively. Code and retrained models are available at https://github.com/Aitical/MCLIR.

Learning from History: Task-agnostic Model Contrastive Learning for Image Restoration

TL;DR

This paper tackles the problem of ill-posed image restoration by proposing a task-agnostic model-contrastive learning framework (MCLIR) that generates negative samples from a latency model of the target network. The core idea is to train the current model with negatives that evolve as an exponential moving average of the parameters, enabling adaptive curriculum-like hard negatives. A Self-Prior Guided Negative Loss (SPN) operates in a perceptual embedding space (via and ) with , and the total objective is , optionally using multiple negatives to strengthen regularization. Empirically, retraining diverse models for SR, dehazing, deblurring, and deraining yields substantial gains across datasets and architectures, with notable improvements such as up to dB in indoor dehazing and improved PSNR/SSIM on multiple benchmarks. The results suggest that SPN-based model contrastive learning offers a versatile, generalizable enhancement to low-level vision tasks, reducing reliance on task-specific negative design and enabling broader applicability.

Abstract

Contrastive learning has emerged as a prevailing paradigm for high-level vision tasks, which, by introducing properly negative samples, has also been exploited for low-level vision tasks to achieve a compact optimization space to account for their ill-posed nature. However, existing methods rely on manually predefined and task-oriented negatives, which often exhibit pronounced task-specific biases. To address this challenge, our paper introduces an innovative method termed 'learning from history', which dynamically generates negative samples from the target model itself. Our approach, named Model Contrastive Learning for Image Restoration (MCLIR), rejuvenates latency models as negative models, making it compatible with diverse image restoration tasks. We propose the Self-Prior guided Negative loss (SPN) to enable it. This approach significantly enhances existing models when retrained with the proposed model contrastive paradigm. The results show significant improvements in image restoration across various tasks and architectures. For example, models retrained with SPN outperform the original FFANet and DehazeFormer by 3.41 dB and 0.57 dB on the RESIDE indoor dataset for image dehazing. Similarly, they achieve notable improvements of 0.47 dB on SPA-Data over IDT for image deraining and 0.12 dB on Manga109 for a 4x scale super-resolution over lightweight SwinIR, respectively. Code and retrained models are available at https://github.com/Aitical/MCLIR.
Paper Structure (29 sections, 6 equations, 5 figures, 9 tables)

This paper contains 29 sections, 6 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Illustration of the proposed model contrastive paradigm. We provide a common optimization space for it. To the target model $f_{\theta}$, the proposed model contrastive paradigm exploits negative samples from the latency model $f_{\theta^{'}}$ smoothly. Compared to task-oriented negatives in previous work, our model contrastive paradigm is task-agonist and general to various image restoration tasks. This provides a compact optimization space adaptively (pushing target model $f_{\theta}$ closer to assumed optimal $f_{\theta^{*}}$).
  • Figure 2: Comparisons between models retrained by our proposed model contrastive paradigm and the originals. Retrained models can achieve remarkable improvements on various image restoration tasks.
  • Figure 3: Visual comparison for image super-resolution tasks. Displayed are results from both the original SwinIR models and those retrained by our model contrastive paradigm. The enhancements brought about by our approach are clearly evident.
  • Figure 4: Visual comparisons between IDT and our retrained one. A divergence map delineates the differences between the IDT output and ours, highlighting the improvement, particularly in degraded regions.
  • Figure 5: Visual results of FFANet and our retrained one for image dehazing.