Table of Contents
Fetching ...

Fast Multi-Organ Fine Segmentation in CT Images with Hierarchical Sparse Sampling and Residual Transformer

Xueqi Guo, Halid Ziya Yerebakan, Yoshihisa Shinagawa, Kritika Iyer, Gerardo Hermosillo Valadez

TL;DR

The paper addresses the challenge of fast, voxel-level multi-organ CT segmentation by introducing a fast fine segmentation framework that combines hierarchical sparse sampling with a Residual Transformer. This approach builds multi-resolution sparse descriptors and decodes them through Transformer-based tokens, enabling full-volume segmentation to be reconstructed from sparse queries on CPU in real-time. Empirical results on internal and public datasets show improved segmentation performance over fast organ classifiers while achieving CPU inference around 2.24 seconds per volume, approaching real-time operation. The method has significant clinical potential for real-time workflows such as scan registration, lesion detection, and landmarking without reliance on GPU acceleration.

Abstract

Multi-organ segmentation of 3D medical images is fundamental with meaningful applications in various clinical automation pipelines. Although deep learning has achieved superior performance, the time and memory consumption of segmenting the entire 3D volume voxel by voxel using neural networks can be huge. Classifiers have been developed as an alternative in cases with certain points of interest, but the trade-off between speed and accuracy remains an issue. Thus, we propose a novel fast multi-organ segmentation framework with the usage of hierarchical sparse sampling and a Residual Transformer. Compared with whole-volume analysis, the hierarchical sparse sampling strategy could successfully reduce computation time while preserving a meaningful hierarchical context utilizing multiple resolution levels. The architecture of the Residual Transformer segmentation network could extract and combine information from different levels of information in the sparse descriptor while maintaining a low computational cost. In an internal data set containing 10,253 CT images and the public dataset TotalSegmentator, the proposed method successfully improved qualitative and quantitative segmentation performance compared to the current fast organ classifier, with fast speed at the level of ~2.24 seconds on CPU hardware. The potential of achieving real-time fine organ segmentation is suggested.

Fast Multi-Organ Fine Segmentation in CT Images with Hierarchical Sparse Sampling and Residual Transformer

TL;DR

The paper addresses the challenge of fast, voxel-level multi-organ CT segmentation by introducing a fast fine segmentation framework that combines hierarchical sparse sampling with a Residual Transformer. This approach builds multi-resolution sparse descriptors and decodes them through Transformer-based tokens, enabling full-volume segmentation to be reconstructed from sparse queries on CPU in real-time. Empirical results on internal and public datasets show improved segmentation performance over fast organ classifiers while achieving CPU inference around 2.24 seconds per volume, approaching real-time operation. The method has significant clinical potential for real-time workflows such as scan registration, lesion detection, and landmarking without reliance on GPU acceleration.

Abstract

Multi-organ segmentation of 3D medical images is fundamental with meaningful applications in various clinical automation pipelines. Although deep learning has achieved superior performance, the time and memory consumption of segmenting the entire 3D volume voxel by voxel using neural networks can be huge. Classifiers have been developed as an alternative in cases with certain points of interest, but the trade-off between speed and accuracy remains an issue. Thus, we propose a novel fast multi-organ segmentation framework with the usage of hierarchical sparse sampling and a Residual Transformer. Compared with whole-volume analysis, the hierarchical sparse sampling strategy could successfully reduce computation time while preserving a meaningful hierarchical context utilizing multiple resolution levels. The architecture of the Residual Transformer segmentation network could extract and combine information from different levels of information in the sparse descriptor while maintaining a low computational cost. In an internal data set containing 10,253 CT images and the public dataset TotalSegmentator, the proposed method successfully improved qualitative and quantitative segmentation performance compared to the current fast organ classifier, with fast speed at the level of ~2.24 seconds on CPU hardware. The potential of achieving real-time fine organ segmentation is suggested.

Paper Structure

This paper contains 10 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The architecture of the proposed segmentation method.
  • Figure 2: The details of the hierarchical sparse sampling strategy demonstrating the descriptor generated from the sample query point on the 3D volume. In the figure, the red "X" is the query location from where generates the descriptor through sparse sampling. The white dots demonstrates the sampling location in multiple spatial resolutions.
  • Figure 3: Visualization of a sample whole volume segmentation result of the proposed method, with the whole volume multi-class dice score annotated. (A) The CT image; (B) Segmentation ground truth labels; (C) Segmentation result from the proposed method.