Table of Contents
Fetching ...

PECI-Net: Bolus segmentation from video fluoroscopic swallowing study images using preprocessing ensemble and cascaded inference

Dougho Park, Younghun Kim, Harim Kang, Junmyeoung Lee, Jinyoung Choi, Taeyeon Kim, Sangeok Lee, Seokil Son, Minsol Kim, Injung Kim

TL;DR

PECI-Net is proposed, a network architecture for VFSS image analysis that combines two novel techniques: the preprocessing ensemble network (PEN) and the cascaded inference network (CIN), and the results of the ablation studies confirm that CIN and PEN are effective in improving bolus segmentation performance.

Abstract

Bolus segmentation is crucial for the automated detection of swallowing disorders in videofluoroscopic swallowing studies (VFSS). However, it is difficult for the model to accurately segment a bolus region in a VFSS image because VFSS images are translucent, have low contrast and unclear region boundaries, and lack color information. To overcome these challenges, we propose PECI-Net, a network architecture for VFSS image analysis that combines two novel techniques: the preprocessing ensemble network (PEN) and the cascaded inference network (CIN). PEN enhances the sharpness and contrast of the VFSS image by combining multiple preprocessing algorithms in a learnable way. CIN reduces ambiguity in bolus segmentation by using context from other regions through cascaded inference. Moreover, CIN prevents undesirable side effects from unreliably segmented regions by referring to the context in an asymmetric way. In experiments, PECI-Net exhibited higher performance than four recently developed baseline models, outperforming TernausNet, the best among the baseline models, by 4.54\% and the widely used UNet by 10.83\%. The results of the ablation studies confirm that CIN and PEN are effective in improving bolus segmentation performance.

PECI-Net: Bolus segmentation from video fluoroscopic swallowing study images using preprocessing ensemble and cascaded inference

TL;DR

PECI-Net is proposed, a network architecture for VFSS image analysis that combines two novel techniques: the preprocessing ensemble network (PEN) and the cascaded inference network (CIN), and the results of the ablation studies confirm that CIN and PEN are effective in improving bolus segmentation performance.

Abstract

Bolus segmentation is crucial for the automated detection of swallowing disorders in videofluoroscopic swallowing studies (VFSS). However, it is difficult for the model to accurately segment a bolus region in a VFSS image because VFSS images are translucent, have low contrast and unclear region boundaries, and lack color information. To overcome these challenges, we propose PECI-Net, a network architecture for VFSS image analysis that combines two novel techniques: the preprocessing ensemble network (PEN) and the cascaded inference network (CIN). PEN enhances the sharpness and contrast of the VFSS image by combining multiple preprocessing algorithms in a learnable way. CIN reduces ambiguity in bolus segmentation by using context from other regions through cascaded inference. Moreover, CIN prevents undesirable side effects from unreliably segmented regions by referring to the context in an asymmetric way. In experiments, PECI-Net exhibited higher performance than four recently developed baseline models, outperforming TernausNet, the best among the baseline models, by 4.54\% and the widely used UNet by 10.83\%. The results of the ablation studies confirm that CIN and PEN are effective in improving bolus segmentation performance.
Paper Structure (29 sections, 11 equations, 8 figures, 9 tables)

This paper contains 29 sections, 11 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: An example of VFSS image with low contrast and the ground truth (G.T.) bolus region.
  • Figure 2: The architecture of PECI-Net. (a) Preprocessing Ensemble Network (PEN); (b) Cascaded Inference Network (CIN)
  • Figure 3: The enhanced images and bolus segmentation results according to the preprocessing algorithm. (a) input image, (b) G.T. bolus region, (c) Laplacian sharpening, (d) CLAHE, and (e) PEN. (PEN outputs a three-channel enhanced image as in Fig. \ref{['fig:preprocessing_ensemble']}. (e) displays the average of them to save space.)
  • Figure 4: Preprocessing ensemble. (a) input images, (b)-(f): the results of preprocessing algorithms, (g) the output of PEN (3 channels).
  • Figure 5: GradCAM of the four decoder blocks of TransUNet. Red represents the highest values, and purple represents the lowest values. The pixels in the cervical spine and mandible regions are assigned with high importance values.
  • ...and 3 more figures