Table of Contents
Fetching ...

Multi-target and multi-stage liver lesion segmentation and detection in multi-phase computed tomography scans

Abdullah F. Al-Battal, Soan T. M. Duong, Van Ha Tang, Quang Duc Tran, Steven Q. H. Truong, Chien Phan, Truong Q. Nguyen, Cheolhong An

TL;DR

The paper addresses the challenge of liver lesion segmentation in multi-phase CT by proposing a three-stage framework that exploits phase-specific information and multiscale lesion cues. It introduces Stage 1 heatmap-based localization, Stage 2 a main multi-phase segmentation model plus per-phase models with ConvNext blocks and a Coarse+Fine Feature Fusion & Attention module, and Stage 3 lesion-wise segmentation correction and refinement. The approach achieves a relative Dice improvement around 1.6% and reduces subject-wise performance variability by about 8% compared to state-of-the-art baselines, with additional validation on BraTS to demonstrate cross-domain potential. The framework offers practical benefits for radiologists by improving accuracy and consistency, and it provides a generalizable design that can augment existing multi-phase or multi-contrast segmentation systems.

Abstract

Multi-phase computed tomography (CT) scans use contrast agents to highlight different anatomical structures within the body to improve the probability of identifying and detecting anatomical structures of interest and abnormalities such as liver lesions. Yet, detecting these lesions remains a challenging task as these lesions vary significantly in their size, shape, texture, and contrast with respect to surrounding tissue. Therefore, radiologists need to have an extensive experience to be able to identify and detect these lesions. Segmentation-based neural networks can assist radiologists with this task. Current state-of-the-art lesion segmentation networks use the encoder-decoder design paradigm based on the UNet architecture where the multi-phase CT scan volume is fed to the network as a multi-channel input. Although this approach utilizes information from all the phases and outperform single-phase segmentation networks, we demonstrate that their performance is not optimal and can be further improved by incorporating the learning from models trained on each single-phase individually. Our approach comprises three stages. The first stage identifies the regions within the liver where there might be lesions at three different scales (4, 8, and 16 mm). The second stage includes the main segmentation model trained using all the phases as well as a segmentation model trained on each of the phases individually. The third stage uses the multi-phase CT volumes together with the predictions from each of the segmentation models to generate the final segmentation map. Overall, our approach improves relative liver lesion segmentation performance by 1.6% while reducing performance variability across subjects by 8% when compared to the current state-of-the-art models.

Multi-target and multi-stage liver lesion segmentation and detection in multi-phase computed tomography scans

TL;DR

The paper addresses the challenge of liver lesion segmentation in multi-phase CT by proposing a three-stage framework that exploits phase-specific information and multiscale lesion cues. It introduces Stage 1 heatmap-based localization, Stage 2 a main multi-phase segmentation model plus per-phase models with ConvNext blocks and a Coarse+Fine Feature Fusion & Attention module, and Stage 3 lesion-wise segmentation correction and refinement. The approach achieves a relative Dice improvement around 1.6% and reduces subject-wise performance variability by about 8% compared to state-of-the-art baselines, with additional validation on BraTS to demonstrate cross-domain potential. The framework offers practical benefits for radiologists by improving accuracy and consistency, and it provides a generalizable design that can augment existing multi-phase or multi-contrast segmentation systems.

Abstract

Multi-phase computed tomography (CT) scans use contrast agents to highlight different anatomical structures within the body to improve the probability of identifying and detecting anatomical structures of interest and abnormalities such as liver lesions. Yet, detecting these lesions remains a challenging task as these lesions vary significantly in their size, shape, texture, and contrast with respect to surrounding tissue. Therefore, radiologists need to have an extensive experience to be able to identify and detect these lesions. Segmentation-based neural networks can assist radiologists with this task. Current state-of-the-art lesion segmentation networks use the encoder-decoder design paradigm based on the UNet architecture where the multi-phase CT scan volume is fed to the network as a multi-channel input. Although this approach utilizes information from all the phases and outperform single-phase segmentation networks, we demonstrate that their performance is not optimal and can be further improved by incorporating the learning from models trained on each single-phase individually. Our approach comprises three stages. The first stage identifies the regions within the liver where there might be lesions at three different scales (4, 8, and 16 mm). The second stage includes the main segmentation model trained using all the phases as well as a segmentation model trained on each of the phases individually. The third stage uses the multi-phase CT volumes together with the predictions from each of the segmentation models to generate the final segmentation map. Overall, our approach improves relative liver lesion segmentation performance by 1.6% while reducing performance variability across subjects by 8% when compared to the current state-of-the-art models.
Paper Structure (24 sections, 10 equations, 10 figures, 2 tables)

This paper contains 24 sections, 10 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Three slices from different subjects in the liver lesion dataset (a)-(c). Each slice is acquired at a different phase and cropped to the liver region. The lesion region of interest is highlighted in a yellow bounding box. Within each of the cropped regions, the boundary of the lesion is outlined in yellow.
  • Figure 2: The proposed framework structure with its different stages. The three outputs of stage 1 are converted to a heatmap that weights the input to stage 2. The outputs of stage 2 are concatenated with the CT image volume from each phase and then fed to stage 3. The structure of the models in stage 2 and model 2 in stage 3, as well as the structure of the Stem, Output, 3D ConvNext, and 3D DownConvNext blocks are outlined in Fig. \ref{['fig:model']}.
  • Figure 3: The architecture and structure of the proposed main segmentation model in stage 2 of the framework (a) with the different building blocks outlined (b): The convolutional block in the encoder and decoder branches, the upsampling convolutional block, the attention and feature fusion block, and the stem and output blocks. The spatial dimension and number of feature channels at the output of each of these blocks are outlined in (a). The structure of model 2 in stage 3 of the framework is the same as the main model in (a), but uses only 3 stages (stages 1, 2, and 3) instead of 4 in the encoder and decoder together with the bottleneck bridge.
  • Figure 4: The structure of the Axial Projected Coarse Attention (APCA) module and the Gated Fine Attention module (GFA), which are the two components of the Coarse+Fine Feature Fusion & Attention Module.
  • Figure 5: The largest axial slice of each liver lesion in the whole dataset (a), the training set (b), and the test set (c) overlaid onto one image at a scale of 1 mm. The boundary of these lesions in the same axial slice for the whole dataset (d), training set (e), and test set (f).
  • ...and 5 more figures