Wave-GMS: Lightweight Multi-Scale Generative Model for Medical Image Segmentation
Talha Ahmed, Nehal Ahmed Shaikh, Hassan Mohy-ud-Din
TL;DR
Medical image segmentation often relies on heavy models ill-suited for budget GPUs. Wave-GMS delivers a memory-efficient, multi-scale generative framework by coupling a trainable multi-resolution encoder with a frozen Tiny-VAE and a Latent Mapping Model to translate image latents into mask latents, then decodes them to segmentation masks. With approximately 2.6 million trainable parameters and no need for large pretrained vision foundations, it achieves state-of-the-art performance and strong cross-domain generalization across four public datasets, while enabling large-batch training on limited hardware. The approach offers practical impact for equitable AI deployment in clinical settings and sets a path toward extending to 3D data and broader foundation-model integration.
Abstract
For equitable deployment of AI tools in hospitals and healthcare facilities, we need Deep Segmentation Networks that offer high performance and can be trained on cost-effective GPUs with limited memory and large batch sizes. In this work, we propose Wave-GMS, a lightweight and efficient multi-scale generative model for medical image segmentation. Wave-GMS has a substantially smaller number of trainable parameters, does not require loading memory-intensive pretrained vision foundation models, and supports training with large batch sizes on GPUs with limited memory. We conducted extensive experiments on four publicly available datasets (BUS, BUSI, Kvasir-Instrument, and HAM10000), demonstrating that Wave-GMS achieves state-of-the-art segmentation performance with superior cross-domain generalizability, while requiring only ~2.6M trainable parameters. Code is available at https://github.com/ATPLab-LUMS/Wave-GMS.
