Table of Contents
Fetching ...

LV-UNet: A Lightweight and Vanilla Model for Medical Image Segmentation

Juntao Jiang, Mengmeng Wang, Huizhong Tian, Lingbo Cheng, Yong Liu

TL;DR

LV-UNet addresses the need for lightweight, robust medical image segmentation suitable for point-of-care and mobile devices. It combines a pre-trained MobileNetv3-Large encoder with fusible expansion modules and a deep training strategy, followed by re-parametrization to deployment mode to reduce parameters and FLOPs. On five diverse datasets (ISIC2016, BUSI, CVC-ClinicDB, CVC-ColonDB, Kvair-SEG), it achieves competitive segmentation accuracy with significantly reduced computational cost compared to state-of-the-art and vanilla baselines. The study demonstrates a practical design pattern—merging pre-trained backbones with fusible modules and re-parametrization—that could guide future lightweight medical image segmentation research.

Abstract

While large models have achieved significant progress in computer vision, challenges such as optimization complexity, the intricacy of transformer architectures, computational constraints, and practical application demands highlight the importance of simpler model designs in medical image segmentation. This need is particularly pronounced in mobile medical devices, which require lightweight, deployable models with real-time performance. However, existing lightweight models often suffer from poor robustness across datasets, limiting their widespread adoption. To address these challenges, this paper introduces LV-UNet, a lightweight and vanilla model that leverages pre-trained MobileNetv3-Large backbones and incorporates fusible modules. LV-UNet employs an enhanced deep training strategy and switches to a deployment mode during inference by re-parametrization, significantly reducing parameter count and computational overhead. Experimental results on ISIC 2016, BUSI, CVC-ClinicDB, CVC-ColonDB, and Kvair-SEG datasets demonstrate a better trade-off between performance and the computational load. The code will be released at https://github.com/juntaoJianggavin/LV-UNet.

LV-UNet: A Lightweight and Vanilla Model for Medical Image Segmentation

TL;DR

LV-UNet addresses the need for lightweight, robust medical image segmentation suitable for point-of-care and mobile devices. It combines a pre-trained MobileNetv3-Large encoder with fusible expansion modules and a deep training strategy, followed by re-parametrization to deployment mode to reduce parameters and FLOPs. On five diverse datasets (ISIC2016, BUSI, CVC-ClinicDB, CVC-ColonDB, Kvair-SEG), it achieves competitive segmentation accuracy with significantly reduced computational cost compared to state-of-the-art and vanilla baselines. The study demonstrates a practical design pattern—merging pre-trained backbones with fusible modules and re-parametrization—that could guide future lightweight medical image segmentation research.

Abstract

While large models have achieved significant progress in computer vision, challenges such as optimization complexity, the intricacy of transformer architectures, computational constraints, and practical application demands highlight the importance of simpler model designs in medical image segmentation. This need is particularly pronounced in mobile medical devices, which require lightweight, deployable models with real-time performance. However, existing lightweight models often suffer from poor robustness across datasets, limiting their widespread adoption. To address these challenges, this paper introduces LV-UNet, a lightweight and vanilla model that leverages pre-trained MobileNetv3-Large backbones and incorporates fusible modules. LV-UNet employs an enhanced deep training strategy and switches to a deployment mode during inference by re-parametrization, significantly reducing parameter count and computational overhead. Experimental results on ISIC 2016, BUSI, CVC-ClinicDB, CVC-ColonDB, and Kvair-SEG datasets demonstrate a better trade-off between performance and the computational load. The code will be released at https://github.com/juntaoJianggavin/LV-UNet.
Paper Structure (26 sections, 6 equations, 3 figures, 7 tables)

This paper contains 26 sections, 6 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: The architecture of LV-UNet: the basic modules include pre-trained MobileNetv3-Large blocks(the initial convolution stage and the group i@ to iii@ (the first inverted residual block to ninth), fusible encoder blocks, fusible decoder blocks, skip-connections, and the output block.
  • Figure 2: The architecture of the fusible blocks in the training and deployment modes
  • Figure 3: Example visualizations of segmentation results of different models.