Table of Contents
Fetching ...

ConStyle v2: A Strong Prompter for All-in-One Image Restoration

Dongqi Fan, Junhao Zhang, Liang Chang

TL;DR

ConStyle v2 tackles the impracticality of all-in-one image restoration by providing a strong visual prompter that guides a generic restoration network without degradation-specific priors. The authors introduce a two-stage training regime with unsupervised pre-training, a pretext classification task, and knowledge distillation, paired with a Mix Degradations dataset to enable robust handling of multiple degradations. Across multiple backbones (Restormer, NAFNet, MAXIM-1S, and a vanilla CNN), ConStyle v2 delivers significant gains in PSNR/SSIM for all-in-one restoration and improves performance on certain single-degradation tasks, while mitigating model collapse. The work delivers a practical, plug-and-play module that broadens IR applicability, and contributes a scalable dataset resource for multi-degradation training, though some degradation types remain challenging and require further data-generation improvements.

Abstract

This paper introduces ConStyle v2, a strong plug-and-play prompter designed to output clean visual prompts and assist U-Net Image Restoration models in handling multiple degradations. The joint training process of IRConStyle, an Image Restoration framework consisting of ConStyle and a general restoration network, is divided into two stages: first, pre-training ConStyle alone, and then freezing its weights to guide the training of the general restoration network. Three improvements are proposed in the pre-training stage to train ConStyle: unsupervised pre-training, adding a pretext task (i.e. classification), and adopting knowledge distillation. Without bells and whistles, we can get ConStyle v2, a strong prompter for all-in-one Image Restoration, in less than two GPU days and doesn't require any fine-tuning. Extensive experiments on Restormer (transformer-based), NAFNet (CNN-based), MAXIM-1S (MLP-based), and a vanilla CNN network demonstrate that ConStyle v2 can enhance any U-Net style Image Restoration models to all-in-one Image Restoration models. Furthermore, models guided by the well-trained ConStyle v2 exhibit superior performance in some specific degradation compared to ConStyle.

ConStyle v2: A Strong Prompter for All-in-One Image Restoration

TL;DR

ConStyle v2 tackles the impracticality of all-in-one image restoration by providing a strong visual prompter that guides a generic restoration network without degradation-specific priors. The authors introduce a two-stage training regime with unsupervised pre-training, a pretext classification task, and knowledge distillation, paired with a Mix Degradations dataset to enable robust handling of multiple degradations. Across multiple backbones (Restormer, NAFNet, MAXIM-1S, and a vanilla CNN), ConStyle v2 delivers significant gains in PSNR/SSIM for all-in-one restoration and improves performance on certain single-degradation tasks, while mitigating model collapse. The work delivers a practical, plug-and-play module that broadens IR applicability, and contributes a scalable dataset resource for multi-degradation training, though some degradation types remain challenging and require further data-generation improvements.

Abstract

This paper introduces ConStyle v2, a strong plug-and-play prompter designed to output clean visual prompts and assist U-Net Image Restoration models in handling multiple degradations. The joint training process of IRConStyle, an Image Restoration framework consisting of ConStyle and a general restoration network, is divided into two stages: first, pre-training ConStyle alone, and then freezing its weights to guide the training of the general restoration network. Three improvements are proposed in the pre-training stage to train ConStyle: unsupervised pre-training, adding a pretext task (i.e. classification), and adopting knowledge distillation. Without bells and whistles, we can get ConStyle v2, a strong prompter for all-in-one Image Restoration, in less than two GPU days and doesn't require any fine-tuning. Extensive experiments on Restormer (transformer-based), NAFNet (CNN-based), MAXIM-1S (MLP-based), and a vanilla CNN network demonstrate that ConStyle v2 can enhance any U-Net style Image Restoration models to all-in-one Image Restoration models. Furthermore, models guided by the well-trained ConStyle v2 exhibit superior performance in some specific degradation compared to ConStyle.

Paper Structure

This paper contains 23 sections, 3 equations, 12 figures, 13 tables.

Figures (12)

  • Figure 1: Different way to solve multiple degradations. (a) The priors are obtained by setting sub-networks as many as degradations. (b) The example pair of the degradation-clean must provide in training and inference stage. (c) ConStyle v2 adaptively outputs clean visual prompts according to different degradations to guide the training of the general restoration network.
  • Figure 2: The training diagram of the ConStyle v2. Only the Encoder is retained once training is complete.
  • Figure 3: The difference between ConStyle (a) and ConStyle v2 (b), and the detail structure of the Momentum Encoder and Encoder (c). Where Con. Loss and Cross. Loss are abbreviations of Content Loss and CrossEntropy Loss. In ConStyle, the Momentum Encoder and Queue are only removed in the inference stage, while, in ConStyle v2, they are removed when the pre-training is finished.
  • Figure 4: The detailed structure of the original models (a)(b)(c)(d) and the ConStyle/ConStyle v2 models (e)(f)(g)(h). DC represents the downsample and concat operation, and UC represents upsample and concat operation
  • Figure 5: Original Conv (a), ConStyle Conv (b), Restormer (c), and ConStyle Restormer (d) trained on Mix Degradations datasets.
  • ...and 7 more figures