Table of Contents
Fetching ...

Learning Multi-axis Representation in Frequency Domain for Medical Image Segmentation

Jiacheng Ruan, Jingsheng Gao, Mingye Xie, Suncheng Xiang

TL;DR

This work proposes Multi-axis External Weights UNet (MEW-UNet) based on the U-shape architecture by replacing self-attention in ViT with the authors' Multi-axis External Weights block that performs a Fourier transform on the three axes of the input features and assigns the external weight in the frequency domain.

Abstract

Recently, Visual Transformer (ViT) has been extensively used in medical image segmentation (MIS) due to applying self-attention mechanism in the spatial domain to modeling global knowledge. However, many studies have focused on improving models in the spatial domain while neglecting the importance of frequency domain information. Therefore, we propose Multi-axis External Weights UNet (MEW-UNet) based on the U-shape architecture by replacing self-attention in ViT with our Multi-axis External Weights block. Specifically, our block performs a Fourier transform on the three axes of the input features and assigns the external weight in the frequency domain, which is generated by our External Weights Generator. Then, an inverse Fourier transform is performed to change the features back to the spatial domain. We evaluate our model on four datasets, including Synapse, ACDC, ISIC17 and ISIC18 datasets, and our approach demonstrates competitive performance, owing to its effective utilization of frequency domain information.

Learning Multi-axis Representation in Frequency Domain for Medical Image Segmentation

TL;DR

This work proposes Multi-axis External Weights UNet (MEW-UNet) based on the U-shape architecture by replacing self-attention in ViT with the authors' Multi-axis External Weights block that performs a Fourier transform on the three axes of the input features and assigns the external weight in the frequency domain.

Abstract

Recently, Visual Transformer (ViT) has been extensively used in medical image segmentation (MIS) due to applying self-attention mechanism in the spatial domain to modeling global knowledge. However, many studies have focused on improving models in the spatial domain while neglecting the importance of frequency domain information. Therefore, we propose Multi-axis External Weights UNet (MEW-UNet) based on the U-shape architecture by replacing self-attention in ViT with our Multi-axis External Weights block. Specifically, our block performs a Fourier transform on the three axes of the input features and assigns the external weight in the frequency domain, which is generated by our External Weights Generator. Then, an inverse Fourier transform is performed to change the features back to the spatial domain. We evaluate our model on four datasets, including Synapse, ACDC, ISIC17 and ISIC18 datasets, and our approach demonstrates competitive performance, owing to its effective utilization of frequency domain information.
Paper Structure (15 sections, 7 equations, 4 figures, 5 tables)

This paper contains 15 sections, 7 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Frequency-aware clues for medical image segmentation. (a) Discrete Fourier Transform (DFT) is used in three $10\times10$ patches. Blue represents segmentation region 1 (Liver), red represents segmentation region 2 (Gallbladder), and green represents background. (b) the frequency signal strength curve of selected patches when DFT is performed only on a single axis. (c) performs DFT on three axes comprehensively.
  • Figure 2: (a) The overall architecture of MEW-UNet. (b) Multi-axis External Weights mechanism. $\mathcal{F}_{(H,W)}$ and $\mathcal{F}^{-1}_{(H,W)}$ refer to conducting 2D DFT and 2D inverse DFT along the Height-Width axis of the feature map. $\mathcal{F}_{(C,W)}$, $\mathcal{F}^{-1}_{(C,W)}$, $\mathcal{F}_{(C,H)}$, and $\mathcal{F}^{-1}_{(C,H)}$ could be illustrated as the same. (c) Multi-axis External Weights block. FFN presents the feed-forward layer.
  • Figure 3: The visualization comparison on the Synapse and ISIC2018 datasets.
  • Figure 4: The visualization comparison of the proposed Multi-axis External Weights Block on ISIC2018 dataset.