MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration
Yulin Ren, Xin Li, Bingchen Li, Xingrui Wang, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen
TL;DR
MoE-DiffIR tackles the problem of universal compressed image restoration across diverse codecs by learning task-specific diffusion priors from Stable Diffusion. It introduces a Mixture-of-Experts Prompt module with a degradation-aware router and a Visual2Text adapter to leverage cross-modal priors, enabling robust texture restoration at low bitrates. A two-stage fine-tuning regime and a comprehensive CIR dataset benchmark across 21 degradations validate its effectiveness, showing superior perceptual quality (LPIPS/FID) and competitive fidelity (PSNR/SSIM) compared to state-of-the-art diffusion-based IR methods. The work demonstrates the practical potential of universal CIR with diffusion priors and cross-modal guidance, while noting limitations at extreme bitrates and suggesting avenues for further improvement.
Abstract
We present MoE-DiffIR, an innovative universal compressed image restoration (CIR) method with task-customized diffusion priors. This intends to handle two pivotal challenges in the existing CIR methods: (i) lacking adaptability and universality for different image codecs, e.g., JPEG and WebP; (ii) poor texture generation capability, particularly at low bitrates. Specifically, our MoE-DiffIR develops the powerful mixture-of-experts (MoE) prompt module, where some basic prompts cooperate to excavate the task-customized diffusion priors from Stable Diffusion (SD) for each compression task. Moreover, the degradation-aware routing mechanism is proposed to enable the flexible assignment of basic prompts. To activate and reuse the cross-modality generation prior of SD, we design the visual-to-text adapter for MoE-DiffIR, which aims to adapt the embedding of low-quality images from the visual domain to the textual domain as the textual guidance for SD, enabling more consistent and reasonable texture generation. We also construct one comprehensive benchmark dataset for universal CIR, covering 21 types of degradations from 7 popular traditional and learned codecs. Extensive experiments on universal CIR have demonstrated the excellent robustness and texture restoration capability of our proposed MoE-DiffIR. The project can be found at https://renyulin-f.github.io/MoE-DiffIR.github.io/.
