SegStitch: Multidimensional Transformer for Robust and Efficient Medical Imaging Segmentation
Shengbo Tan, Zeyu Zhang, Ying Cai, Daji Ergu, Lin Wu, Binbin Hu, Pengzhang Yu, Yang Zhao
TL;DR
SegStitch addresses the challenge of efficient, robust 3D medical image segmentation by integrating transformers with neural memory ordinary differential equations (nmODE) and by using axial patches with position-aware tokenization. The architecture employs shared queries for dual fine- and coarse-grained self-attention to capture long-range dependencies while maintaining linear-complexity, and it embeds an nmODE block in the decoder to improve stability and feature integration. Empirical results on multiple public datasets (including Synapse and ACDC, with references to BTCV in the abstract) show significant accuracy gains over state-of-the-art methods, along with substantial parameter and FLOP reductions relative to UNETR. The work demonstrates strong potential for real-world clinical deployment due to improved segmentation performance and increased computational efficiency, supported by comprehensive ablations and robustness analyses.
Abstract
Medical imaging segmentation plays a significant role in the automatic recognition and analysis of lesions. State-of-the-art methods, particularly those utilizing transformers, have been prominently adopted in 3D semantic segmentation due to their superior performance in scalability and generalizability. However, plain vision transformers encounter challenges due to their neglect of local features and their high computational complexity. To address these challenges, we introduce three key contributions: Firstly, we proposed SegStitch, an innovative architecture that integrates transformers with denoising ODE blocks. Instead of taking whole 3D volumes as inputs, we adapt axial patches and customize patch-wise queries to ensure semantic consistency. Additionally, we conducted extensive experiments on the BTCV and ACDC datasets, achieving improvements up to 11.48% and 6.71% respectively in mDSC, compared to state-of-the-art methods. Lastly, our proposed method demonstrates outstanding efficiency, reducing the number of parameters by 36.7% and the number of FLOPS by 10.7% compared to UNETR. This advancement holds promising potential for adapting our method to real-world clinical practice. The code will be available at https://github.com/goblin327/SegStitch
