Table of Contents
Fetching ...

MM-UNet: A Mixed MLP Architecture for Improved Ophthalmic Image Segmentation

Zunjie Xiao, Xiaoqing Zhang, Risa Higashita, Jiang Liu

TL;DR

The paper tackles ophthalmic image segmentation, where CNNs’ local biases and the computational cost of transformers hinder performance on small datasets. It introduces MM-UNet, a hybrid UNet with a Multi-Scale MLP (MMLP) that partitions channels and applies local token mixing to capture multi-scale dependencies without heavy self-attention. The MMLP omits channel-mixing MLPs and uses grouped channels with tailored locality to balance global context and local detail, achieving state-of-the-art results on AS-OCT and REFUG2 with competitive parameter counts; ablations show Local Token Mixing (LTM) outperforms global token-mixing. Overall, MM-UNet provides an efficient, pretraining-friendly approach to ophthalmic segmentation with potential for broader biomedical imaging applications.

Abstract

Ophthalmic image segmentation serves as a critical foundation for ocular disease diagnosis. Although fully convolutional neural networks (CNNs) are commonly employed for segmentation, they are constrained by inductive biases and face challenges in establishing long-range dependencies. Transformer-based models address these limitations but introduce substantial computational overhead. Recently, a simple yet efficient Multilayer Perceptron (MLP) architecture was proposed for image classification, achieving competitive performance relative to advanced transformers. However, its effectiveness for ophthalmic image segmentation remains unexplored. In this paper, we introduce MM-UNet, an efficient Mixed MLP model tailored for ophthalmic image segmentation. Within MM-UNet, we propose a multi-scale MLP (MMLP) module that facilitates the interaction of features at various depths through a grouping strategy, enabling simultaneous capture of global and local information. We conducted extensive experiments on both a private anterior segment optical coherence tomography (AS-OCT) image dataset and a public fundus image dataset. The results demonstrated the superiority of our MM-UNet model in comparison to state-of-the-art deep segmentation networks.

MM-UNet: A Mixed MLP Architecture for Improved Ophthalmic Image Segmentation

TL;DR

The paper tackles ophthalmic image segmentation, where CNNs’ local biases and the computational cost of transformers hinder performance on small datasets. It introduces MM-UNet, a hybrid UNet with a Multi-Scale MLP (MMLP) that partitions channels and applies local token mixing to capture multi-scale dependencies without heavy self-attention. The MMLP omits channel-mixing MLPs and uses grouped channels with tailored locality to balance global context and local detail, achieving state-of-the-art results on AS-OCT and REFUG2 with competitive parameter counts; ablations show Local Token Mixing (LTM) outperforms global token-mixing. Overall, MM-UNet provides an efficient, pretraining-friendly approach to ophthalmic segmentation with potential for broader biomedical imaging applications.

Abstract

Ophthalmic image segmentation serves as a critical foundation for ocular disease diagnosis. Although fully convolutional neural networks (CNNs) are commonly employed for segmentation, they are constrained by inductive biases and face challenges in establishing long-range dependencies. Transformer-based models address these limitations but introduce substantial computational overhead. Recently, a simple yet efficient Multilayer Perceptron (MLP) architecture was proposed for image classification, achieving competitive performance relative to advanced transformers. However, its effectiveness for ophthalmic image segmentation remains unexplored. In this paper, we introduce MM-UNet, an efficient Mixed MLP model tailored for ophthalmic image segmentation. Within MM-UNet, we propose a multi-scale MLP (MMLP) module that facilitates the interaction of features at various depths through a grouping strategy, enabling simultaneous capture of global and local information. We conducted extensive experiments on both a private anterior segment optical coherence tomography (AS-OCT) image dataset and a public fundus image dataset. The results demonstrated the superiority of our MM-UNet model in comparison to state-of-the-art deep segmentation networks.
Paper Structure (11 sections, 7 equations, 4 figures, 3 tables)

This paper contains 11 sections, 7 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The proposed multi-scale MLP (MMLP) block.
  • Figure 2: The scheme of MM-Unet.
  • Figure 3: This represents the two ophthalmic datasets we used. (a) and (b) is the AS-OCT image and its segmentation label, respectively; (c) and (d) show the fundus image of the REFUGE2 dataset.
  • Figure 4: The comparison results of our proposed MM-UNet with other state-of-the-art models.