Table of Contents
Fetching ...

Evidential learning driven Breast Tumor Segmentation with Stage-divided Vision-Language Interaction

Jingxing Zhong, Qingtao Pan, Xuchang Zhou, Jiazhen Lin, Xinguo Zhuang

Abstract

Breast cancer is one of the most common causes of death among women worldwide, with millions of fatalities annually. Magnetic Resonance Imaging (MRI) can provide various sequences for characterizing tumor morphology and internal patterns, and becomes an effective tool for detection and diagnosis of breast tumors. However, previous deep-learning based tumor segmentation methods have limitations in accurately locating tumor contours due to the challenge of low contrast between cancer and normal areas and blurred boundaries. Leveraging text prompt information holds promise in ameliorating tumor segmentation effect by delineating segmentation regions. Inspired by this, we propose text-guided Breast Tumor Segmentation model (TextBCS) with stage-divided vision-language interaction and evidential learning. Specifically, the proposed stage-divided vision-language interaction facilitates information mutual between visual and text features at each stage of down-sampling, further exerting the advantages of text prompts to assist in locating lesion areas in low contrast scenarios. Moreover, the evidential learning is adopted to quantify the segmentation uncertainty of the model for blurred boundary. It utilizes the variational Dirichlet to characterize the distribution of the segmentation probabilities, addressing the segmentation uncertainties of the boundaries. Extensive experiments validate the superiority of our TextBCS over other segmentation networks, showcasing the best breast tumor segmentation performance on publicly available datasets.

Evidential learning driven Breast Tumor Segmentation with Stage-divided Vision-Language Interaction

Abstract

Breast cancer is one of the most common causes of death among women worldwide, with millions of fatalities annually. Magnetic Resonance Imaging (MRI) can provide various sequences for characterizing tumor morphology and internal patterns, and becomes an effective tool for detection and diagnosis of breast tumors. However, previous deep-learning based tumor segmentation methods have limitations in accurately locating tumor contours due to the challenge of low contrast between cancer and normal areas and blurred boundaries. Leveraging text prompt information holds promise in ameliorating tumor segmentation effect by delineating segmentation regions. Inspired by this, we propose text-guided Breast Tumor Segmentation model (TextBCS) with stage-divided vision-language interaction and evidential learning. Specifically, the proposed stage-divided vision-language interaction facilitates information mutual between visual and text features at each stage of down-sampling, further exerting the advantages of text prompts to assist in locating lesion areas in low contrast scenarios. Moreover, the evidential learning is adopted to quantify the segmentation uncertainty of the model for blurred boundary. It utilizes the variational Dirichlet to characterize the distribution of the segmentation probabilities, addressing the segmentation uncertainties of the boundaries. Extensive experiments validate the superiority of our TextBCS over other segmentation networks, showcasing the best breast tumor segmentation performance on publicly available datasets.
Paper Structure (22 sections, 15 equations, 6 figures, 4 tables)

This paper contains 22 sections, 15 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Various segmentation methods for breast cancer segmentation. (a) Traditional segmentation methods rely solely on the image to perform segmentation tasks. (b) Our TextBCS leverages text prompts related to canceration regions to assist in locating breast cancer areas.
  • Figure 2: Overview of TextBCS, consisting of the SVLI module and the evidential learning module. SVLI module performs cross fusion and alignment between the image and text feature at each downsampling stage, enhancing sufficient image-text interaction. The evidential learning module performs pixel-level uncertainty estimation based on the decoding results, preventing the model from making unreliable predictions.
  • Figure 3: The visual comparison of segmentation results between different methods,such as UNet, nnUNet, TransUNet, CLIP, MGCA and LViT, is displayed. Each row corresponds to one subject.
  • Figure 4: Saliency map for interpretability study of different layers in encoder. It is obvious that the introduction of interaction between text prompts and images in each stage can better locate the breast cancer region.
  • Figure 5: A case that our method fail under the text prompt by segmenting incorrect areas and insufficiently identifying the target area.
  • ...and 1 more figures