Table of Contents
Fetching ...

sFRC for assessing hallucinations in medical image restoration

Prabhat Kc, Rongping Zeng, Nirmal Soni, Aldo Badano

TL;DR

This work proposes performing Fourier Ring Correlation analysis over small patches and concomitantly (s)canning across DL outputs and their reference counterparts to detect hallucinations (termed as sFRC), and describes the rationale behind sFRC and provides its mathematical formulation.

Abstract

Deep learning (DL) methods are currently being explored to restore images from sparse-view-, limited-data-, and undersampled-based acquisitions in medical applications. Although outputs from DL may appear visually appealing based on likability/subjective criteria (such as less noise, smooth features), they may also suffer from hallucinations. This issue is further exacerbated by a lack of easy-to-use techniques and robust metrics for the identification of hallucinations in DL outputs. In this work, we propose performing Fourier Ring Correlation (FRC) analysis over small patches and concomitantly (s)canning across DL outputs and their reference counterparts to detect hallucinations (termed as sFRC). We describe the rationale behind sFRC and provide its mathematical formulation. The parameters essential to sFRC may be set using predefined hallucinated features annotated by subject matter experts or using imaging theory-based hallucination maps. We use sFRC to detect hallucinations for three undersampled medical imaging problems: CT super-resolution, CT sparse view, and MRI subsampled restoration. In the testing phase, we demonstrate sFRC's effectiveness in detecting hallucinated features for the CT problem and sFRC's agreement with imaging theory-based outputs on hallucinated feature maps for the MR problem. Finally, we quantify the hallucination rates of DL methods on in-distribution versus out-of-distribution data and under increasing subsampling rates to characterize the robustness of DL methods. Beyond DL-based methods, sFRC's effectiveness in detecting hallucinations for a conventional regularization-based restoration method and a state-of-the-art unrolled method is also shown.

sFRC for assessing hallucinations in medical image restoration

TL;DR

This work proposes performing Fourier Ring Correlation analysis over small patches and concomitantly (s)canning across DL outputs and their reference counterparts to detect hallucinations (termed as sFRC), and describes the rationale behind sFRC and provides its mathematical formulation.

Abstract

Deep learning (DL) methods are currently being explored to restore images from sparse-view-, limited-data-, and undersampled-based acquisitions in medical applications. Although outputs from DL may appear visually appealing based on likability/subjective criteria (such as less noise, smooth features), they may also suffer from hallucinations. This issue is further exacerbated by a lack of easy-to-use techniques and robust metrics for the identification of hallucinations in DL outputs. In this work, we propose performing Fourier Ring Correlation (FRC) analysis over small patches and concomitantly (s)canning across DL outputs and their reference counterparts to detect hallucinations (termed as sFRC). We describe the rationale behind sFRC and provide its mathematical formulation. The parameters essential to sFRC may be set using predefined hallucinated features annotated by subject matter experts or using imaging theory-based hallucination maps. We use sFRC to detect hallucinations for three undersampled medical imaging problems: CT super-resolution, CT sparse view, and MRI subsampled restoration. In the testing phase, we demonstrate sFRC's effectiveness in detecting hallucinated features for the CT problem and sFRC's agreement with imaging theory-based outputs on hallucinated feature maps for the MR problem. Finally, we quantify the hallucination rates of DL methods on in-distribution versus out-of-distribution data and under increasing subsampling rates to characterize the robustness of DL methods. Beyond DL-based methods, sFRC's effectiveness in detecting hallucinations for a conventional regularization-based restoration method and a state-of-the-art unrolled method is also shown.
Paper Structure (36 sections, 5 equations, 14 figures, 4 tables, 1 algorithm)

This paper contains 36 sections, 5 equations, 14 figures, 4 tables, 1 algorithm.

Figures (14)

  • Figure 1: A range of artifacts, including (a) patient implant-based artifact ldct_data_2016), (b) missing wedge-based acquisition artifacts (described in SI section S2), and (c) hallucinations due to the application of an AI-assisted SRGAN model (described in \ref{['sec:srgan_model']}) to upsample the low-resolution counterpart of (d) by a factor of $4$. As opposed to artifacts in (a) and (b), hallucinations in (c) in the yellow box (with the two loops of bowel instead of one contiguous loop) and in the green box (an unwarranted plaque-like structure), only become concretely apparent after a thorough comparison against its normal-resolution reference in (d). Display windows for (a) is (W:$\,1402$ L:$\,-1358$), (b) is (W:$\,780$ L:$\,260$, and (c-d) is (W:$\,700$ L:$\,50$).
  • Figure 2: Decomposition of a patch containing a hallucinated structure in (g) - indicated by a red arrow - into its frequency-based components in (h-l). (m) is a normal-resolution CT patch. (g) is obtained by upsampling (using SRGAN described in section \ref{['sec:srgan_model']}) the low-resolution (downsampled four times) counterpart of (m). (a) is the Fourier transform of (g). (b-f) are different bandpass filters. (h-l) are the image components of (g) obtained by convolving the bandpass filters, as indicated in (b-f), with (g). Similarly, (n-r) are the image components of (m) obtained by convolving the bandpass filters, as shown in (b-f), with (m).
  • Figure 3: A visual depiction of our sFRC analysis to detect hallucinated regions. Complimentary pairs of patches (i.e., over the same x-y coordinate) of the same dimension are scanned across the image pairs from the two methods (reference and deep learning images) in (a). FRC is calculated in (b) across all patch pairs from (a). In addition to the FRC threshold used in a typical FRC calculation, we introduce a new threshold called $x_{h_{t}}$, a line parallel to $y-$axis. Patches corresponding to the deep learning method whose FRC comparison against their reference counterparts lead to $x_{c_{t}} \leq x_{h_{t}}$ are labeled as candidates to exhibit hallucinations. As shown in fig. \ref{['img:set_xht_general']}, $x_{h_{t}}$ is set a priori using hallucinated ROIs/maps estimated by an imaging theory or annotated by subject matter experts using tuning or developmental set.
  • Figure 4: An illustration for setting $x_{h_{t}}$ as an upper bound of $x_{c_{t}}$s using patches that are labeled as hallucinations by subject matter experts or using an imaging theory.
  • Figure 5: An illustration of the imaging theory-based approach to set $x_{h_{t}}$. In our sFRC analysis of the MRI output post-processed, using a U-Net in (c), $0.75$ is set as its FRC threshold and $0.16$ as its $x_{h_{t}}$ such that the patches labeled as hallucinations from our analysis (indicated by the red bounding boxes in (d)) overlap with the false structured obtained using a theoretical approach proposed by Bhadra et al. in varun_hallu.
  • ...and 9 more figures