Table of Contents
Fetching ...

D2-MLP: Dynamic Decomposed MLP Mixer for Medical Image Segmentation

Jin Yang, Xiaobing Yu, Peijie Qiu

TL;DR

The paper tackles the challenge of global context learning in medical image segmentation by introducing the Dynamic Decomposed MLP (D2-MLP), which uses a Dynamic Decomposed Mixer (DDM) to separately and adaptively fuse spatial and channel information. DDM comprises two Spatially Decomposed Mixers for height and width, a Channel Mixer, and two dynamic mixing mechanisms (Spatial-wise and Channel-wise) to model inter-dependencies and fuse features. Integrated into a four-stage ViT-based encoder-decoder, D2-MLP achieves superior segmentation on FLARE 2021 and MSD Liver datasets, outperforming state-of-the-art CNN-, ViT-, and hybrid-based methods, with ablations showing the benefit of the DDM and optimal patch settings. The work advances dense medical segmentation by providing a flexible, efficient approach to combining spatial and channel information with dynamic fusion, improving both accuracy and robustness in clinical contexts.

Abstract

Convolutional neural networks are widely used in various segmentation tasks in medical images. However, they are challenged to learn global features adaptively due to the inherent locality of convolutional operations. In contrast, MLP Mixers are proposed as a backbone to learn global information across channels with low complexity. However, they cannot capture spatial features efficiently. Additionally, they lack effective mechanisms to fuse and mix features adaptively. To tackle these limitations, we propose a novel Dynamic Decomposed Mixer module. It is designed to employ novel Mixers to extract features and aggregate information across different spatial locations and channels. Additionally, it employs novel dynamic mixing mechanisms to model inter-dependencies between channel and spatial feature representations and to fuse them adaptively. Subsequently, we incorporate it into a U-shaped Transformer-based architecture to generate a novel network, termed the Dynamic Decomposed MLP Mixer. We evaluated it for medical image segmentation on two datasets, and it achieved superior segmentation performance than other state-of-the-art methods.

D2-MLP: Dynamic Decomposed MLP Mixer for Medical Image Segmentation

TL;DR

The paper tackles the challenge of global context learning in medical image segmentation by introducing the Dynamic Decomposed MLP (D2-MLP), which uses a Dynamic Decomposed Mixer (DDM) to separately and adaptively fuse spatial and channel information. DDM comprises two Spatially Decomposed Mixers for height and width, a Channel Mixer, and two dynamic mixing mechanisms (Spatial-wise and Channel-wise) to model inter-dependencies and fuse features. Integrated into a four-stage ViT-based encoder-decoder, D2-MLP achieves superior segmentation on FLARE 2021 and MSD Liver datasets, outperforming state-of-the-art CNN-, ViT-, and hybrid-based methods, with ablations showing the benefit of the DDM and optimal patch settings. The work advances dense medical segmentation by providing a flexible, efficient approach to combining spatial and channel information with dynamic fusion, improving both accuracy and robustness in clinical contexts.

Abstract

Convolutional neural networks are widely used in various segmentation tasks in medical images. However, they are challenged to learn global features adaptively due to the inherent locality of convolutional operations. In contrast, MLP Mixers are proposed as a backbone to learn global information across channels with low complexity. However, they cannot capture spatial features efficiently. Additionally, they lack effective mechanisms to fuse and mix features adaptively. To tackle these limitations, we propose a novel Dynamic Decomposed Mixer module. It is designed to employ novel Mixers to extract features and aggregate information across different spatial locations and channels. Additionally, it employs novel dynamic mixing mechanisms to model inter-dependencies between channel and spatial feature representations and to fuse them adaptively. Subsequently, we incorporate it into a U-shaped Transformer-based architecture to generate a novel network, termed the Dynamic Decomposed MLP Mixer. We evaluated it for medical image segmentation on two datasets, and it achieved superior segmentation performance than other state-of-the-art methods.
Paper Structure (17 sections, 7 equations, 2 figures, 2 tables)

This paper contains 17 sections, 7 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: (A) D2-MLP is a 4-stage encoder-decoder architecture, and each MLP Mixer block consists of a DDM module and a channel MLP. (B) The DDM module decomposes the input feature $\boldsymbol{X}$ into $N$ patches. These patches are spatially reshaped and concatenated along height $H$ and width $W$ to features $\boldsymbol{X}_H$ and $\boldsymbol{X}_W$, separately. Subsequently, two MLPs are employed to aggregate information from $\boldsymbol{X}_H$ and $\boldsymbol{X}_W$ along two dimensions, separately. The interactions between $\boldsymbol{X}_H$ and $\boldsymbol{X}_W$ are improved in Spatial-wise Dynamic Mixing. A Channel Mixer is employed to aggregate information across channels from input features $\boldsymbol{X}$ as features $\boldsymbol{X}_C$. Lastly, features $\boldsymbol{X}_H^*$, $\boldsymbol{X}_W^*$, and $\boldsymbol{X}_C$ are adaptively fused in Channel-wise Dynamic Mixing. (C) Spatial-wise Dynamic Mixing. (D) Channel-wise Dynamic Mixing.
  • Figure 2: Qualitative comparison between D2-MLP and other methods in (A) the FLARE Multi-organ and (B) MSD Liver Tumor datasets.