Table of Contents
Fetching ...

K-Space-Aware Cross-Modality Score for Synthesized Neuroimage Quality Assessment

Guoyang Xie, Jinbao Wang, Yawen Huang, Jiayi Lyu, Feng Zheng, Yefeng Zheng, Yaochu Jin

TL;DR

K-CROSS introduces a lesion- and frequency-aware metric for cross-modality neuroimage synthesis by jointly modeling tumor regions, k-space information, and shared anatomical structure. It uses a two-stage training regime with a complex U‑Net for k-space features, a tumor/structure pathway with a shared encoder, and two score networks to predict quality aligned with radiologist judgments. The method is validated on the NIRPS dataset consisting of 6,000 radiologist evaluations, showing superior agreement with expert assessments compared to traditional metrics like PSNR/SSIM and other IQA baselines, especially in capturing MR-specific properties. The work provides a scalable MRI-informed evaluation framework and a large radiologist-annotated dataset, with implications for improving cross-modality synthesis assessment in clinical contexts and beyond MRI-specific image generation tasks.

Abstract

The problem of how to assess cross-modality medical image synthesis has been largely unexplored. The most used measures like PSNR and SSIM focus on analyzing the structural features but neglect the crucial lesion location and fundamental k-space speciality of medical images. To overcome this problem, we propose a new metric K-CROSS to spur progress on this challenging problem. Specifically, K-CROSS uses a pre-trained multi-modality segmentation network to predict the lesion location, together with a tumor encoder for representing features, such as texture details and brightness intensities. To further reflect the frequency-specific information from the magnetic resonance imaging principles, both k-space features and vision features are obtained and employed in our comprehensive encoders with a frequency reconstruction penalty. The structure-shared encoders are designed and constrained with a similarity loss to capture the intrinsic common structural information for both modalities. As a consequence, the features learned from lesion regions, k-space, and anatomical structures are all captured, which serve as our quality evaluators. We evaluate the performance by constructing a large-scale cross-modality neuroimaging perceptual similarity (NIRPS) dataset with 6,000 radiologist judgments. Extensive experiments demonstrate that the proposed method outperforms other metrics, especially in comparison with the radiologists on NIRPS.

K-Space-Aware Cross-Modality Score for Synthesized Neuroimage Quality Assessment

TL;DR

K-CROSS introduces a lesion- and frequency-aware metric for cross-modality neuroimage synthesis by jointly modeling tumor regions, k-space information, and shared anatomical structure. It uses a two-stage training regime with a complex U‑Net for k-space features, a tumor/structure pathway with a shared encoder, and two score networks to predict quality aligned with radiologist judgments. The method is validated on the NIRPS dataset consisting of 6,000 radiologist evaluations, showing superior agreement with expert assessments compared to traditional metrics like PSNR/SSIM and other IQA baselines, especially in capturing MR-specific properties. The work provides a scalable MRI-informed evaluation framework and a large radiologist-annotated dataset, with implications for improving cross-modality synthesis assessment in clinical contexts and beyond MRI-specific image generation tasks.

Abstract

The problem of how to assess cross-modality medical image synthesis has been largely unexplored. The most used measures like PSNR and SSIM focus on analyzing the structural features but neglect the crucial lesion location and fundamental k-space speciality of medical images. To overcome this problem, we propose a new metric K-CROSS to spur progress on this challenging problem. Specifically, K-CROSS uses a pre-trained multi-modality segmentation network to predict the lesion location, together with a tumor encoder for representing features, such as texture details and brightness intensities. To further reflect the frequency-specific information from the magnetic resonance imaging principles, both k-space features and vision features are obtained and employed in our comprehensive encoders with a frequency reconstruction penalty. The structure-shared encoders are designed and constrained with a similarity loss to capture the intrinsic common structural information for both modalities. As a consequence, the features learned from lesion regions, k-space, and anatomical structures are all captured, which serve as our quality evaluators. We evaluate the performance by constructing a large-scale cross-modality neuroimaging perceptual similarity (NIRPS) dataset with 6,000 radiologist judgments. Extensive experiments demonstrate that the proposed method outperforms other metrics, especially in comparison with the radiologists on NIRPS.
Paper Structure (37 sections, 20 equations, 10 figures, 8 tables, 4 algorithms)

This paper contains 37 sections, 20 equations, 10 figures, 8 tables, 4 algorithms.

Figures (10)

  • Figure 1: K-CROSS vs. PSNR vs. SSIM. The first and second columns on the left represent the source and synthesized target modality data, respectively. The zoom indicates the modality-specific tumor region, which is provided by the pre-trained multi-modality neuroimage segmentation network. The numbers on the right represent PSNR, SSIM, radiologist score, and our K-CROSS value. PSNR, SSIM, and K-CROSS are rescaled (to 1.0000) for comparison with the radiologist's score. In terms of lesion region and k-space measurement, K-CROSS is more compatible with the radiologist's score than PSNR and SSIM.
  • Figure 2: Flowchart of our proposed K-CROSS. First stage input: For reference neuroimage (source modality) and its query neuroimage (the target modality), the private portion is indicated by a blue box. The neuroimage's structural feature is captured by the structure encoder. The reconstruction specifics are evaluated using $L_{stru}$. With a similarity loss $L_{simi}$, the presentation of the shared structure is maintained. The k-space feature to a modality is captured by Complex U-Net and optimized by $L_{freq}$. SegNet gets the mask of the tumor region from neuroimages that are specific to a given modality. The parameters of the off-the-shelf SegNet are not updated during the training phase, such as nnUnet Isensee2020nnUNetAS, TransUNet Chen2021TransUNetTM, and SwinUNet Cao2021SwinUnetUP. The tumor encoder learns how to represent the tumor mask region, particularly texture details and the level of brightness. The quality of the SegNet-reconstructed tumor region is constrained using the loss $L_{tumor}$. First stage output: The private tumor encoder, the private complex encoder and the shared structure encoder for both modalities. Second stage input: The input are the query modality neuroimage and the modality-specific tumor encoder, complex encoder and the shared structure encoder from the first stage. The two main components of K-CROSS are $\eta_{complex}$ and $\eta_{nature}$. The complex score network yields $\eta_{complex}$, whereas the natural score network yields $\eta_{nature}$. The output of the tumor encoder and the structure encoder are combined as the input of the natural score network. The average score for $\eta_{complex}$ and $\eta_{nature}$ is $\eta_{total}$. For $\eta_{total}$, K-CROSS uses a straightforward regression model during the training phase, with labels taken from the NIPRS dataset. Inference: The input are the query modality neuroimage and the modality-specific tumor encoder, complex encoder and the shared structure encoder from the first stage. The output score is $\eta_{total}$.
  • Figure 3: The architecture of complex encoder and decoder. The encoder consists of Complex Conv2d in Section \ref{['sec:complex_conv']} and Complex BatchNorm in Section \ref{['sec:complex_bn']}. The decoder contains ComplexConvTranspose2d and Complex BatchNorm. The ComplexConvTranspose2d is similar to ComplxeConv2d except for the convolution operator. The middle part includes Complex Upsample in Section \ref{['sec:complex_up']}, Complex Tanh in Section \ref{['sec:complex_tanh']} and Complex Conv2d.
  • Figure 4: Illustration of complex convolution.
  • Figure 5: Illustration of complex operator.
  • ...and 5 more figures