Table of Contents
Fetching ...

Structure-Accurate Medical Image Translation via Dynamic Frequency Balance and Knowledge Guidance

Jiahua Xu, Dawei Zhou, Lei Hu, Zaiyi Liu, Nannan Wang, Xinbo Gao

TL;DR

The paper tackles the problem of missing medical imaging modalities by proposing DFBK, a diffusion-based translation framework that preserves anatomical structures. It integrates a Dynamic Frequency Balance module, which uses wavelet decomposition to separately enhance low-frequency anatomy and high-frequency details, with a Knowledge Guidance mechanism that fuses BiomedCLIP priors into the translation process. Implemented on a SwinUNet backbone with ResShift diffusion, DFBK demonstrates superior qualitative and quantitative performance across BraTs 2023, ISLES 2015, and SynthRAD 2023, suggesting robust structure preservation across modalities. Limitations include reliance on high-quality paired data, and future work will explore unpaired translation and broader modality coverage to improve clinical applicability.

Abstract

Multimodal medical images play a crucial role in the precise and comprehensive clinical diagnosis. Diffusion model is a powerful strategy to synthesize the required medical images. However, existing approaches still suffer from the problem of anatomical structure distortion due to the overfitting of high-frequency information and the weakening of low-frequency information. Thus, we propose a novel method based on dynamic frequency balance and knowledge guidance. Specifically, we first extract the low-frequency and high-frequency components by decomposing the critical features of the model using wavelet transform. Then, a dynamic frequency balance module is designed to adaptively adjust frequency for enhancing global low-frequency features and effective high-frequency details as well as suppressing high-frequency noise. To further overcome the challenges posed by the large differences between different medical modalities, we construct a knowledge-guided mechanism that fuses the prior clinical knowledge from a visual language model with visual features, to facilitate the generation of accurate anatomical structures. Experimental evaluations on multiple datasets show the proposed method achieves significant improvements in qualitative and quantitative assessments, verifying its effectiveness and superiority.

Structure-Accurate Medical Image Translation via Dynamic Frequency Balance and Knowledge Guidance

TL;DR

The paper tackles the problem of missing medical imaging modalities by proposing DFBK, a diffusion-based translation framework that preserves anatomical structures. It integrates a Dynamic Frequency Balance module, which uses wavelet decomposition to separately enhance low-frequency anatomy and high-frequency details, with a Knowledge Guidance mechanism that fuses BiomedCLIP priors into the translation process. Implemented on a SwinUNet backbone with ResShift diffusion, DFBK demonstrates superior qualitative and quantitative performance across BraTs 2023, ISLES 2015, and SynthRAD 2023, suggesting robust structure preservation across modalities. Limitations include reliance on high-quality paired data, and future work will explore unpaired translation and broader modality coverage to improve clinical applicability.

Abstract

Multimodal medical images play a crucial role in the precise and comprehensive clinical diagnosis. Diffusion model is a powerful strategy to synthesize the required medical images. However, existing approaches still suffer from the problem of anatomical structure distortion due to the overfitting of high-frequency information and the weakening of low-frequency information. Thus, we propose a novel method based on dynamic frequency balance and knowledge guidance. Specifically, we first extract the low-frequency and high-frequency components by decomposing the critical features of the model using wavelet transform. Then, a dynamic frequency balance module is designed to adaptively adjust frequency for enhancing global low-frequency features and effective high-frequency details as well as suppressing high-frequency noise. To further overcome the challenges posed by the large differences between different medical modalities, we construct a knowledge-guided mechanism that fuses the prior clinical knowledge from a visual language model with visual features, to facilitate the generation of accurate anatomical structures. Experimental evaluations on multiple datasets show the proposed method achieves significant improvements in qualitative and quantitative assessments, verifying its effectiveness and superiority.

Paper Structure

This paper contains 17 sections, 11 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Schematic diagram of our research problem and solution. During medical imaging, certain modalities may be missing due to scanning cost and safety issues. Generating missing modalities via dynamic frequency balance and knowledge guidance can help radiologists make more comprehensive diagnoses.
  • Figure 2: Framework of the proposed method DFBK. In the lower right corner is the Dynamic Frequency Balance module, which achieves frequency balance by adaptively enhancing low-frequency features and effective high-frequency features while suppressing high-frequency noise. In the lower left corner is the Knowledge-Guided mechanism, which fuses the prior knowledge with global low-frequency features and retains only critical prior knowledge and suppresses irrelevant information when fusing with high-frequency features to avoid over-enhancing high-frequency features. Embedding is obtained from the text information using VLM's text encoder. By using the dynamic frequency balance module and the knowledge-guided mechanism, we can generate target modal images with accurate anatomical structures.
  • Figure 3: Comparison of approaches on BraTs: (a) Ea-GAN, (b) RevGAN, (c) MaskGAN, (d) ResViT, (e) SynDiff, (f) DFBK (ours). The first and third rows show the results of T1 to T2 and T1 to FLAIR translation, respectively. The first column is the source image, and the second column is the target ground-truth. The second and fourth rows show the zoomed-in anatomical structures for the first and third rows, respectively. PSNR and SSIM values of each image are shown in the corner of images. The yellow box indicates the zoomed-in visualization area, and the blue box represents the difference heatmap between the generated image and the ground truth. The color indicates the degree of difference from small to large, with the brighter color (e.g., red) reflecting the larger difference and vice versa (e.g., blue).
  • Figure 4: Comparison of approaches on ISLES: (a) Ea-GAN, (b) RevGAN, (c) MaskGAN, (d) ResViT, (e) SynDiff (f) DFBK (ours). The first and third rows show the results of FLAIR to DWI and FLAIR to T1 translation, respectively. The first column is the source image, and the second column is the target ground-truth. The second and fourth rows show the zoomed-in anatomical structures for the first and third rows, respectively.
  • Figure 5: Comparison of approaches on SynthRAD: (a) Ea-GAN, (b) RevGAN, (c) MaskGAN, (d) ResViT, (e) SynDiff (f) DFBK (ours). The first and third rows show the results of CT to MR and MR to CT translation, respectively. The first column is the source image, and the second column is the target ground-truth. The second and fourth rows show the zoomed-in anatomical structures for the first and third rows, respectively.
  • ...and 3 more figures