Table of Contents
Fetching ...

Wave-GMS: Lightweight Multi-Scale Generative Model for Medical Image Segmentation

Talha Ahmed, Nehal Ahmed Shaikh, Hassan Mohy-ud-Din

TL;DR

Medical image segmentation often relies on heavy models ill-suited for budget GPUs. Wave-GMS delivers a memory-efficient, multi-scale generative framework by coupling a trainable multi-resolution encoder with a frozen Tiny-VAE and a Latent Mapping Model to translate image latents into mask latents, then decodes them to segmentation masks. With approximately 2.6 million trainable parameters and no need for large pretrained vision foundations, it achieves state-of-the-art performance and strong cross-domain generalization across four public datasets, while enabling large-batch training on limited hardware. The approach offers practical impact for equitable AI deployment in clinical settings and sets a path toward extending to 3D data and broader foundation-model integration.

Abstract

For equitable deployment of AI tools in hospitals and healthcare facilities, we need Deep Segmentation Networks that offer high performance and can be trained on cost-effective GPUs with limited memory and large batch sizes. In this work, we propose Wave-GMS, a lightweight and efficient multi-scale generative model for medical image segmentation. Wave-GMS has a substantially smaller number of trainable parameters, does not require loading memory-intensive pretrained vision foundation models, and supports training with large batch sizes on GPUs with limited memory. We conducted extensive experiments on four publicly available datasets (BUS, BUSI, Kvasir-Instrument, and HAM10000), demonstrating that Wave-GMS achieves state-of-the-art segmentation performance with superior cross-domain generalizability, while requiring only ~2.6M trainable parameters. Code is available at https://github.com/ATPLab-LUMS/Wave-GMS.

Wave-GMS: Lightweight Multi-Scale Generative Model for Medical Image Segmentation

TL;DR

Medical image segmentation often relies on heavy models ill-suited for budget GPUs. Wave-GMS delivers a memory-efficient, multi-scale generative framework by coupling a trainable multi-resolution encoder with a frozen Tiny-VAE and a Latent Mapping Model to translate image latents into mask latents, then decodes them to segmentation masks. With approximately 2.6 million trainable parameters and no need for large pretrained vision foundations, it achieves state-of-the-art performance and strong cross-domain generalization across four public datasets, while enabling large-batch training on limited hardware. The approach offers practical impact for equitable AI deployment in clinical settings and sets a path toward extending to 3D data and broader foundation-model integration.

Abstract

For equitable deployment of AI tools in hospitals and healthcare facilities, we need Deep Segmentation Networks that offer high performance and can be trained on cost-effective GPUs with limited memory and large batch sizes. In this work, we propose Wave-GMS, a lightweight and efficient multi-scale generative model for medical image segmentation. Wave-GMS has a substantially smaller number of trainable parameters, does not require loading memory-intensive pretrained vision foundation models, and supports training with large batch sizes on GPUs with limited memory. We conducted extensive experiments on four publicly available datasets (BUS, BUSI, Kvasir-Instrument, and HAM10000), demonstrating that Wave-GMS achieves state-of-the-art segmentation performance with superior cross-domain generalizability, while requiring only ~2.6M trainable parameters. Code is available at https://github.com/ATPLab-LUMS/Wave-GMS.

Paper Structure

This paper contains 13 sections, 6 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: (Left) Wave-GMS - A lightweight multi-scale generative model for medical image segmentation. (Right) A latent mapping model (LMM) learns the transformation from the multi-scale latent space to the segmentation mask embedding space huo2024generative.