Table of Contents
Fetching ...

FoundIR-v2: Optimizing Pre-Training Data Mixtures for Image Restoration Foundation Model

Xiang Chen, Jinshan Pan, Jiangxin Dong, Jian Yang, Jinhui Tang

TL;DR

The paper tackles the challenge of universal image restoration by optimizing how training data is mixture-mixed across tasks. It proposes FoundIR-v2, which combines data equilibrium scheduling to balance multi-task learning with a Mixture-of-Experts diffusion scheduler to allocate task-specific priors in a latent diffusion framework. Key contributions include formalizing data mixing laws for restoration, integrating MoE-guided priors, and demonstrating strong performance across 50+ subtasks and numerous benchmarks. The results indicate that dynamic data and model scheduling yield superior generalization and practical applicability in real-world restoration scenarios.

Abstract

Recent studies have witnessed significant advances in image restoration foundation models driven by improvements in the scale and quality of pre-training data. In this work, we find that the data mixture proportions from different restoration tasks are also a critical factor directly determining the overall performance of all-in-one image restoration models. To this end, we propose a high-capacity diffusion-based image restoration foundation model, FoundIR-v2, which adopts a data equilibrium scheduling paradigm to dynamically optimize the proportions of mixed training datasets from different tasks. By leveraging the data mixing law, our method ensures a balanced dataset composition, enabling the model to achieve consistent generalization and comprehensive performance across diverse tasks. Furthermore, we introduce an effective Mixture-of-Experts (MoE)-driven scheduler into generative pre-training to flexibly allocate task-adaptive diffusion priors for each restoration task, accounting for the distinct degradation forms and levels exhibited by different tasks. Extensive experiments demonstrate that our method can address over 50 sub-tasks across a broader scope of real-world scenarios and achieves favorable performance against state-of-the-art approaches.

FoundIR-v2: Optimizing Pre-Training Data Mixtures for Image Restoration Foundation Model

TL;DR

The paper tackles the challenge of universal image restoration by optimizing how training data is mixture-mixed across tasks. It proposes FoundIR-v2, which combines data equilibrium scheduling to balance multi-task learning with a Mixture-of-Experts diffusion scheduler to allocate task-specific priors in a latent diffusion framework. Key contributions include formalizing data mixing laws for restoration, integrating MoE-guided priors, and demonstrating strong performance across 50+ subtasks and numerous benchmarks. The results indicate that dynamic data and model scheduling yield superior generalization and practical applicability in real-world restoration scenarios.

Abstract

Recent studies have witnessed significant advances in image restoration foundation models driven by improvements in the scale and quality of pre-training data. In this work, we find that the data mixture proportions from different restoration tasks are also a critical factor directly determining the overall performance of all-in-one image restoration models. To this end, we propose a high-capacity diffusion-based image restoration foundation model, FoundIR-v2, which adopts a data equilibrium scheduling paradigm to dynamically optimize the proportions of mixed training datasets from different tasks. By leveraging the data mixing law, our method ensures a balanced dataset composition, enabling the model to achieve consistent generalization and comprehensive performance across diverse tasks. Furthermore, we introduce an effective Mixture-of-Experts (MoE)-driven scheduler into generative pre-training to flexibly allocate task-adaptive diffusion priors for each restoration task, accounting for the distinct degradation forms and levels exhibited by different tasks. Extensive experiments demonstrate that our method can address over 50 sub-tasks across a broader scope of real-world scenarios and achieves favorable performance against state-of-the-art approaches.

Paper Structure

This paper contains 11 sections, 5 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Statistical analysis of the relationship between data mixture proportions and restoration performance for all-in-one foundation model. To facilitate clearer observations, we restrict the experimental analysis to four tasks: deblurring, dehazing, low-light enhancement, and SR. In the figure, the pie chart illustrates the training data distribution across different tasks, while the bar chart reports the corresponding PSNR results on each task’s test set.
  • Figure 2: Illustration of the proposed FoundIR-v2. We adopt a dual scheduling strategy for both the training data and the base model, including (i) a data equilibrium scheduling is introduced into the pre-training process to dynamically optimize the data mixture, and (ii) an MoE-driven scheduler is integrated into the model to dynamically allocate task-adaptive diffusion priors. For low-resolution LQ inputs, our proposed FoundIR-v2 allows users to flexibly choose at test time whether to retain the original image resolution or apply SR operation.
  • Figure 3: Visual example of high-quality GT data filtering.
  • Figure 4: Visual comparison of image restoration results on the FoundIR-L+N and RealPhoto60 benchmarks. Zoom in for a better view.
  • Figure 5: Visual comparison of mural restoration results.
  • ...and 3 more figures