Table of Contents
Fetching ...

Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion

Dong Zhang, Kwang-Ting Cheng

TL;DR

This work tackles the gradient conflict between image enhancement and recognition models in task-driven medical IQE by introducing GradProm, which splits the system into a mainstream image enhancement model (IP) and an auxiliary visual recognition model (VR). GradProm updates IP using VR gradients only when their optimization directions align, measured by the cosine similarity between gradients, ensuring the IP optimization remains unbiased toward the auxiliary task. The authors provide a theoretical argument for convergence to a local minimum of the IP objective and demonstrate state-of-the-art results on four medical datasets (denoising and super-resolution) with both supervised and unsupervised VR settings, highlighting robust cross-domain generalization. The method is architecture- and data-agnostic, offering practical, scalable improvements for clinical imaging pipelines and suggesting extensions to broader task-driven learning scenarios beyond IQE.

Abstract

Thanks to the recent achievements in task-driven image quality enhancement (IQE) models like ESTR, the image enhancement model and the visual recognition model can mutually enhance each other's quantitation while producing high-quality processed images that are perceivable by our human vision systems. However, existing task-driven IQE models tend to overlook an underlying fact -- different levels of vision tasks have varying and sometimes conflicting requirements of image features. To address this problem, this paper proposes a generalized gradient promotion (GradProm) training strategy for task-driven IQE of medical images. Specifically, we partition a task-driven IQE system into two sub-models, i.e., a mainstream model for image enhancement and an auxiliary model for visual recognition. During training, GradProm updates only parameters of the image enhancement model using gradients of the visual recognition model and the image enhancement model, but only when gradients of these two sub-models are aligned in the same direction, which is measured by their cosine similarity. In case gradients of these two sub-models are not in the same direction, GradProm only uses the gradient of the image enhancement model to update its parameters. Theoretically, we have proved that the optimization direction of the image enhancement model will not be biased by the auxiliary visual recognition model under the implementation of GradProm. Empirically, extensive experimental results on four public yet challenging medical image datasets demonstrated the superior performance of GradProm over existing state-of-the-art methods.

Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion

TL;DR

This work tackles the gradient conflict between image enhancement and recognition models in task-driven medical IQE by introducing GradProm, which splits the system into a mainstream image enhancement model (IP) and an auxiliary visual recognition model (VR). GradProm updates IP using VR gradients only when their optimization directions align, measured by the cosine similarity between gradients, ensuring the IP optimization remains unbiased toward the auxiliary task. The authors provide a theoretical argument for convergence to a local minimum of the IP objective and demonstrate state-of-the-art results on four medical datasets (denoising and super-resolution) with both supervised and unsupervised VR settings, highlighting robust cross-domain generalization. The method is architecture- and data-agnostic, offering practical, scalable improvements for clinical imaging pipelines and suggesting extensions to broader task-driven learning scenarios beyond IQE.

Abstract

Thanks to the recent achievements in task-driven image quality enhancement (IQE) models like ESTR, the image enhancement model and the visual recognition model can mutually enhance each other's quantitation while producing high-quality processed images that are perceivable by our human vision systems. However, existing task-driven IQE models tend to overlook an underlying fact -- different levels of vision tasks have varying and sometimes conflicting requirements of image features. To address this problem, this paper proposes a generalized gradient promotion (GradProm) training strategy for task-driven IQE of medical images. Specifically, we partition a task-driven IQE system into two sub-models, i.e., a mainstream model for image enhancement and an auxiliary model for visual recognition. During training, GradProm updates only parameters of the image enhancement model using gradients of the visual recognition model and the image enhancement model, but only when gradients of these two sub-models are aligned in the same direction, which is measured by their cosine similarity. In case gradients of these two sub-models are not in the same direction, GradProm only uses the gradient of the image enhancement model to update its parameters. Theoretically, we have proved that the optimization direction of the image enhancement model will not be biased by the auxiliary visual recognition model under the implementation of GradProm. Empirically, extensive experimental results on four public yet challenging medical image datasets demonstrated the superior performance of GradProm over existing state-of-the-art methods.
Paper Structure (17 sections, 1 theorem, 12 equations, 7 figures, 8 tables)

This paper contains 17 sections, 1 theorem, 12 equations, 7 figures, 8 tables.

Key Result

Lemma 3.1

For the given gradient vector fields $G_{IP} = \nabla_{\theta} L_{IP} (\theta)$ and $G_{VR} = \nabla_{\phi} L_{VR}(\phi)$, GradProm can achieve the local minimum via the following update rule: under the condition that $\alpha^t$ is as small as possible, where $t$/$t+1$ denotes the $t$-th/$(t+1)$-th training epoch for parameter update.

Figures (7)

  • Figure 1: The evolution of medical image quality enhancement (IQE) in visual recognition. The IQE model is represented by dots of different colors in the foreground, indicating different categories, while the colorful block in the background represents the decision boundary in the recognition model. In perception-aware IQE (a), samples have a closer representation space, leading to no direct benefit for the decision boundary in the downstream IQE model. Task-driven IQE (b) employs data transfer or gradient transfer to enhance both the upstream IQE model and the downstream recognition model, resulting in a more compact representation space and a clear decision boundary. However, the feature requirements of these two models may be inconsistent (as shown in Figure \ref{['figr1']} below), leading to sub-optimization. Our GradProm in (c) overcomes this limitation, resulting in a more optimal outcome where the representation space is more compact and the decision boundary is clearer.
  • Figure 2: The guided backpropagation visualizations for different vision tasks. Under the same given image in (a), for denoising in (b), we use SR-ResNet ledig2017photo, for semantic segmentation in (c), we use UNet ronneberger2015u, and for diagnosis in (d), we use ResNet he2016deep. Sample images are from the ISIC 2018 dataset codella2019skin.
  • Figure 3: (a) Joint training puts the upstream and downstream models together, but ignores that different models have inconsistent feature requirements. (b) Our GradProm dynamically selects parameters that can be updated according to the similarity of different model gradients.
  • Figure 4: Qualitative result comparisons with the baseline ESTR liu2022exploring on ISIC 2018 codella2019skin with ResNet-50 he2016deep as the encoder network for image denoising, where $\sigma$ is set to $0.3$ and $VR_{dia}$ is used as the visual recognition model.
  • Figure 5: Class activation maps (CAMs) result comparisons with the baseline model ESTR liu2022exploring on ISIC 2018 codella2019skin with ResNet-50 he2016deep as the encoder for image denoising, where $\sigma$ is set to $0.3$ and $VR_{dia}$ is used as the visual recognition model.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Lemma 3.1
  • proof