DiffLoss: unleashing diffusion model as constraint for training image restoration network

Jiangtong Tan; Feng Zhao

DiffLoss: unleashing diffusion model as constraint for training image restoration network

Jiangtong Tan, Feng Zhao

TL;DR

Image restoration must balance perceptual naturalness with semantic fidelity under varying degradations. The authors propose DiffLoss, a training-time, diffusion-based prior that does not increase inference cost, leveraging a fixed unconditional diffusion model to constrain restorations in two ways: (i) naturalness through projection into the diffusion sampling space via a forward diffusion step, and (ii) semantic preservation through h-space bottleneck features. DiffLoss defines two losses, $L_{nat}$ and $L_{sem}$, forming $L_{DiffLoss}=L_{nat}+\lambda L_{sem}$ and combines with a standard data fidelity term to yield $L_{total}=\|x - z\|_2 + \gamma L_{DiffLoss}$. Extensive experiments across low-light enhancement, deraining, and dehazing show that DiffLoss improves naturalness and semantic perception, enabling lightweight restorers to achieve higher perceptual quality without extra inference cost.

Abstract

Image restoration aims to enhance low quality images, producing high quality images that exhibit natural visual characteristics and fine semantic attributes. Recently, the diffusion model has emerged as a powerful technique for image generation, and it has been explicitly employed as a backbone in image restoration tasks, yielding excellent results. However, it suffers from the drawbacks of slow inference speed and large model parameters due to its intrinsic characteristics. In this paper, we introduce a new perspective that implicitly leverages the diffusion model to assist the training of image restoration network, called DiffLoss, which drives the restoration results to be optimized for naturalness and semantic-aware visual effect. To achieve this, we utilize the mode coverage capability of the diffusion model to approximate the distribution of natural images and explore its ability to capture image semantic attributes. On the one hand, we extract intermediate noise to leverage its modeling capability of the distribution of natural images, which serves as a naturalness-oriented optimization space. On the other hand, we utilize the bottleneck features of diffusion model to harness its semantic attributes serving as a constraint on semantic level. By combining these two designs, the overall loss function is able to improve the perceptual quality of image restoration, resulting in visually pleasing and semantically enhanced outcomes. To validate the effectiveness of our method, we conduct experiments on various common image restoration tasks and benchmarks. Extensive experimental results demonstrate that our approach enhances the visual quality and semantic perception of the restoration network.

DiffLoss: unleashing diffusion model as constraint for training image restoration network

TL;DR

Abstract

DiffLoss: unleashing diffusion model as constraint for training image restoration network

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)