Contextual Embedding Learning to Enhance 2D Networks for Volumetric Image Segmentation

Zhuoyuan Wang; Dong Sun; Xiangyun Zeng; Ruodai Wu; Yi Wang

Contextual Embedding Learning to Enhance 2D Networks for Volumetric Image Segmentation

Zhuoyuan Wang, Dong Sun, Xiangyun Zeng, Ruodai Wu, Yi Wang

TL;DR

The paper tackles the limited volumetric context of 2D segmentation by introducing a Contextual Embedding (CE) block that transfers inter-slice information through learned embeddings and a distance-based neighbor matching mechanism. This CE block, combined with an Attention Merge Module, refines per-slice predictions to yield coherent volumetric segmentations while remaining memory-efficient. Across PROMISE12 and CHAOS, CE-enhanced 2D networks achieve segmentation performance on par with or approaching 3D models, with significantly lower computational cost. The work demonstrates a practical, plug-and-play approach to bridge 2D and 3D segmentation, enabling better volumetric results in resource-constrained settings.

Abstract

The segmentation of organs in volumetric medical images plays an important role in computer-aided diagnosis and treatment/surgery planning. Conventional 2D convolutional neural networks (CNNs) can hardly exploit the spatial correlation of volumetric data. Current 3D CNNs have the advantage to extract more powerful volumetric representations but they usually suffer from occupying excessive memory and computation nevertheless. In this study we aim to enhance the 2D networks with contextual information for better volumetric image segmentation. Accordingly, we propose a contextual embedding learning approach to facilitate 2D CNNs capturing spatial information properly. Our approach leverages the learned embedding and the slice-wisely neighboring matching as a soft cue to guide the network. In such a way, the contextual information can be transferred slice-by-slice thus boosting the volumetric representation of the network. Experiments on challenging prostate MRI dataset (PROMISE12) and abdominal CT dataset (CHAOS) show that our contextual embedding learning can effectively leverage the inter-slice context and improve segmentation performance. The proposed approach is a plug-and-play, and memory-efficient solution to enhance the 2D networks for volumetric segmentation. Our code is publicly available at https://github.com/JuliusWang-7/CE_Block.

Contextual Embedding Learning to Enhance 2D Networks for Volumetric Image Segmentation

TL;DR

Abstract

Paper Structure (14 sections, 6 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 6 equations, 9 figures, 3 tables, 1 algorithm.

Introduction
Methods
Overview of the Contextual Embedding Block
Embedding Space
Slice-wisely Neighboring Matching
Attention Merge Module
Experiments and Results
Experimental Data and Pre-processing
Implementation Details
Evaluation Metrics
Segmentation Accuracy
Efficiency Comparison
Discussion
Conclusion

Figures (9)

Figure 1: For volumetric medical images, conventional 2D networks can only segment each 2D slice individually, but can hardly obtain the context information between slices, which results in incomplete and discontinuous segmentation results as shown in (d). Our contextual embedding (CE) approach transfers contextual information via a slice-wisely neighboring matching mechanism thus boosting the volumetric representation of the 2D network. In (d), the blue, green, and orange surfaces indicate ground-truth segmentation, conventional 2D result and CE-enhanced result, respectively. It shows our approach makes the segmentation more smoother and complete in 3D.
Figure 2: The schematic overview of the proposed network. The yellow arrows indicate the workflow of the conventional 2D segmentation using the encoder-decoder architecture, whereas the pink flow shows the plug-and-play contextual embedding (CE) block to enhance the volumetric representation of the 2D network. Specifically, the CE block leverages the prediction of the neighboring slice (i.e., the $P_{n-l}$ of the $S_{n-l}$) to calculate a distance map by matching the current slice embedding to the embedding of the neighboring slice (see details in Fig. \ref{['fig:distancemap']}). Then the CE block aggregates the neighboring matching distance map, the prediction of the neighboring slice ($P_{n-l}$), the original prediction of the current slice ($P_{n}$), and the backbone feature ($B_{n}$) to generate the refined prediction ($P_{n-CE}$) of the current slice ($S_{n}$).
Figure 3: The detailed design of the contextual embedding block. In order to segment the current slice ($S_n$), the backbone features ($B$), embedding vectors ($E$), and the prediction of the neighboring slice ($P_{n-l}$ of $S_{n-l}$) are employed for it. First, a distance map ($D_{n}$) is obtained by matching the current slice embedding ($E_n$) to the embedding of the neighboring slice ($E_{n-l}$), see the green flow. Then, the $D_{n}$, together with the prediction of the neighboring slice ($P_{n-l}$) and the backbone feature ($B_{n}$) are combined to generate the new prediction of the current slice ($P_{n'}$), see the yellow flow. Finally, the $P_{n'}$ and the original prediction of the current slice ($P_{n}$) are aggregated through an attention merge module (AMM) to produce the final segmentation. The details of the AMM is shown in Fig. \ref{['fig:merge']}.
Figure 4: The structure of the attention merge module (AMM). The new prediction $P_{n'}$ and the original prediction $P_{n}$ of the current slice are aggregated through the AMM. The AMM consists of three sequential functions of $F_{sq}$, $F_{score}$, $F_{mul}$, and the segmentation convolutions to fuse information and generate the final prediction $P_{n-CE}$.
Figure 5: Four transversal slices from the PROMISE12 dataset. The T2-weighted MRI images were collected from different centers and with different protocols: (a) Haukeland University Hospital, Siemens, 1.5T, with endorectal coil (ERC); (b) Beth Israel Deaconess Medical Center University Hospital, GE, 3.0T, with ERC; (c) University College London, Siemens, 1.5/3.0T, without ERC; (d) Radbound University Nijmegen Medical Centre, Siemens, 3.0T, without ERC.
...and 4 more figures

Contextual Embedding Learning to Enhance 2D Networks for Volumetric Image Segmentation

TL;DR

Abstract

Contextual Embedding Learning to Enhance 2D Networks for Volumetric Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)