Table of Contents
Fetching ...

Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration

Akshay Dudhane, Omkar Thawakar, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang

TL;DR

This work proposes DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks, and introduces a dynamic pre-training strategy that trains variants of the proposed DyNet concurrently, thereby achieving a 50% reduction in GPU hours.

Abstract

All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation. The requirement to tackle multiple degradations using the same model can lead to high-complexity designs with fixed configuration that lack the adaptability to more efficient alternatives. We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks. Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment with a single round of training. This seamless switching is enabled by our weights-sharing mechanism, forming the core of our architecture and facilitating the reuse of initialized module weights. Further, to establish robust weights initialization, we introduce a dynamic pre-training strategy that trains variants of the proposed DyNet concurrently, thereby achieving a 50% reduction in GPU hours. Our dynamic pre-training strategy eliminates the need for maintaining separate checkpoints for each variant, as all models share a common set of checkpoints, varying only in model depth. This efficient strategy significantly reduces storage overhead and enhances adaptability. To tackle the unavailability of large-scale dataset required in pre-training, we curate a high-quality, high-resolution image dataset named Million-IRD, having 2M image samples. We validate our DyNet for image denoising, deraining, and dehazing in all-in-one setting, achieving state-of-the-art results with 31.34\% reduction in GFlops and a 56.75\% reduction in parameters compared to baseline models. The source codes and trained models are available at https://github.com/akshaydudhane16/DyNet.

Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration

TL;DR

This work proposes DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks, and introduces a dynamic pre-training strategy that trains variants of the proposed DyNet concurrently, thereby achieving a 50% reduction in GPU hours.

Abstract

All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation. The requirement to tackle multiple degradations using the same model can lead to high-complexity designs with fixed configuration that lack the adaptability to more efficient alternatives. We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks. Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment with a single round of training. This seamless switching is enabled by our weights-sharing mechanism, forming the core of our architecture and facilitating the reuse of initialized module weights. Further, to establish robust weights initialization, we introduce a dynamic pre-training strategy that trains variants of the proposed DyNet concurrently, thereby achieving a 50% reduction in GPU hours. Our dynamic pre-training strategy eliminates the need for maintaining separate checkpoints for each variant, as all models share a common set of checkpoints, varying only in model depth. This efficient strategy significantly reduces storage overhead and enhances adaptability. To tackle the unavailability of large-scale dataset required in pre-training, we curate a high-quality, high-resolution image dataset named Million-IRD, having 2M image samples. We validate our DyNet for image denoising, deraining, and dehazing in all-in-one setting, achieving state-of-the-art results with 31.34\% reduction in GFlops and a 56.75\% reduction in parameters compared to baseline models. The source codes and trained models are available at https://github.com/akshaydudhane16/DyNet.
Paper Structure (20 sections, 1 equation, 13 figures, 7 tables)

This paper contains 20 sections, 1 equation, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Left: At any given encoder-decoder level: (a) Transformer blocks in PromptIR, (b) and (c) Our DyNet-L and DyNet-S use the proposed weights sharing mechanism, initializing one transformer block and sharing its weights with subsequent blocks. Right: A plot of average PSNR in All-in-one IR setting vs GFlops and parameters (in millions). Our DyNet-S boosts performance by 0.43 dB, while reducing GFlops by 31.34% and parameters by 56.75% compared to PromptIR.
  • Figure 2: The proposed Dynamic Network (DyNet) pipeline for all-in-one image restoration. DyNet enhances a low-quality input image using a 4-level encoder-decoder architecture. A distinctive aspect of DyNet lies in its weight-sharing strategy. At each level, the initial transformer block's weights are shared with the subsequent blocks, significantly reducing the network parameters and enhancing its flexibility. This approach allows for easy adjustment of DyNet's complexity, switching between large and small variants by modifying the frequency of weight sharing across the encoder-decoder blocks. Moreover, we maintain the encoder-decoder feature consistency by implicitly learning degradation-aware prompts at skip connections rather than on the decoder side as in PromptIR.
  • Figure 3: The proposed dynamic pre-training strategy is shown. Given a clean image, we create a degraded version by injecting noise (Gaussian or Random), JPEG artifacts and random masking within the same image. Two variants (small and large) of our DyNet are then trained concurrently to reconstruct the clean image from masked degraded inputs. Notably, the weights are shared between both variants since they are based on the same architecture but with an intra-network weight-sharing scheme with varying frequencies of block repetition. One of the two parallel branches is randomly activated in a single forward pass of the model (Flag being the binary indicator variable to show branch activation). The dotted lines show the network path is inactivated (Flag=0). An L1 loss is used to calculate pixel differences between reconstructed outputs and targets to update the shared weights of both branches. Both the inter and intra-model weight-sharing and random activation of branches lead to a significant reduction in GPU hours required for pre-training.
  • Figure 4: On the left: Sample images from our Million-IRD dataset, which features a diverse collection of high-quality, high-resolution photographs. This includes a variety of textures, scenes from nature, sports activities, images taken during the day and at night, intricate textures, wildlife, shots captured from both close and distant perspectives, forest scenes, pictures of monuments, etc. On the right: Sample low-quality images filtered out during the data pre-processing phase (Sec. \ref{['data_collection']}). These images were excluded due to being blurry, watermarked, predominantly featuring flat regions, representing e-commerce product photos, or being noisy or corrupted from artifacts.
  • Figure 5: Comparative analysis of image denoising by all-in-one methods on the BSD68 and Urban100 dataset. DyNet reduces noise, produces a sharp and clear images compared to the PromptIR
  • ...and 8 more figures