Table of Contents
Fetching ...

Submanifold Sparse Convolutional Networks for Automated 3D Segmentation of Kidneys and Kidney Tumours in Computed Tomography

Saúl Alonso-Monsalve, Leigh H. Whitehead, Adam Aurisano, Lorena Escudero Sanchez

TL;DR

A new methodology that uses, divided into two stages, voxel sparsification and submanifold sparse convolutional networks, which allows segmentations to be performed with high-resolution inputs and a native 3D model architecture, obtaining state-of-the-art accuracies while significantly reducing the computational resources needed in terms of GPU memory and time.

Abstract

The accurate delineation of tumours in radiological images like Computed Tomography is a very specialised and time-consuming task, and currently a bottleneck preventing quantitative analyses to be performed routinely in the clinical setting. For this reason, developing methods for the automated segmentation of tumours in medical imaging is of the utmost importance and has driven significant efforts in recent years. However, challenges regarding the impracticality of 3D scans, given the large amount of voxels to be analysed, usually requires the downsampling of such images or using patches thereof when applying traditional convolutional neural networks. To overcome this problem, in this paper we propose a new methodology that uses, divided into two stages, voxel sparsification and submanifold sparse convolutional networks. This method allows segmentations to be performed with high-resolution inputs and a native 3D model architecture, obtaining state-of-the-art accuracies while significantly reducing the computational resources needed in terms of GPU memory and time. We studied the deployment of this methodology in the context of Computed Tomography images of renal cancer patients from the KiTS23 challenge, and our method achieved results competitive with the challenge winners, with Dice similarity coefficients of 95.8% for kidneys + masses, 85.7% for tumours + cysts, and 80.3% for tumours alone. Crucially, our method also offers significant computational improvements, achieving up to a 60% reduction in inference time and up to a 75\% reduction in VRAM usage compared to an equivalent dense architecture, across both CPU and various GPU cards tested.

Submanifold Sparse Convolutional Networks for Automated 3D Segmentation of Kidneys and Kidney Tumours in Computed Tomography

TL;DR

A new methodology that uses, divided into two stages, voxel sparsification and submanifold sparse convolutional networks, which allows segmentations to be performed with high-resolution inputs and a native 3D model architecture, obtaining state-of-the-art accuracies while significantly reducing the computational resources needed in terms of GPU memory and time.

Abstract

The accurate delineation of tumours in radiological images like Computed Tomography is a very specialised and time-consuming task, and currently a bottleneck preventing quantitative analyses to be performed routinely in the clinical setting. For this reason, developing methods for the automated segmentation of tumours in medical imaging is of the utmost importance and has driven significant efforts in recent years. However, challenges regarding the impracticality of 3D scans, given the large amount of voxels to be analysed, usually requires the downsampling of such images or using patches thereof when applying traditional convolutional neural networks. To overcome this problem, in this paper we propose a new methodology that uses, divided into two stages, voxel sparsification and submanifold sparse convolutional networks. This method allows segmentations to be performed with high-resolution inputs and a native 3D model architecture, obtaining state-of-the-art accuracies while significantly reducing the computational resources needed in terms of GPU memory and time. We studied the deployment of this methodology in the context of Computed Tomography images of renal cancer patients from the KiTS23 challenge, and our method achieved results competitive with the challenge winners, with Dice similarity coefficients of 95.8% for kidneys + masses, 85.7% for tumours + cysts, and 80.3% for tumours alone. Crucially, our method also offers significant computational improvements, achieving up to a 60% reduction in inference time and up to a 75\% reduction in VRAM usage compared to an equivalent dense architecture, across both CPU and various GPU cards tested.

Paper Structure

This paper contains 18 sections, 2 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of the two-stage segmentation framework utilising Sparse Submanifold Convolutional Networks (SSCNs). In Stage 1 (ROI finder), a low-resolution sparsified image is processed by a sparse 3D U-Net to identify a region of interest (ROI). Such ROIs are then dilated to be conservative and ensure all relevant structures are included, and passed to the next stage and used to crop the original high-resolution image. In Stage 2 (Segmentation), the high-resolution cropped image is fed into another sparse 3D U-Net to obtain the final segmentations. This two-step approach efficiently reduces computational cost while maintaining segmentation accuracy by focusing on relevant anatomical structures. The segmentation outputs includes kidneys, tumours and cysts, independently segmented following the ground truth annotations from the KiTS23 dataset.
  • Figure 2: Sparsification: the cumulative fraction of voxels removed by applying a minimum (top) and maximum (bottom) threshold to the voxel intensity in Hounsfield Units (HU), shown for the kidney and masses (red) and all other voxels (blue). The green arrows show the chosen regions rejected in the sparsification process.
  • Figure 3: The proposed 3D sparse U-Net architecture. The network follows a hierarchical encoder-decoder structure with progressively increasing feature dimensions in the encoder and corresponding feature reduction in the decoder. Downsampling and upsampling operations are performed with convolution and transposed convolution layers, respectively, while skip connections are implemented via element-wise summation to maintain parameter efficiency.
  • Figure 4: Validation losses for Stage 1 (top) and Stage 2 (bottom). Each column represents the accumulated Dice loss across all deep supervision steps for different outputs: kidneys + masses (left), tumour + cyst (middle), and tumour only (right). Different colours indicate different folds.
  • Figure 5: From left to right: (first column) 2D slice from the original high-resolution scan showing the ground truth segmentations; (second column) predicted segmentations from Stage 1 on the low-resolution version of the scan; (third column) predicted segmentations from Stage 2 on the high-resolution version of the scan. Each row shows a single 2D slice selected from a different case (patient).