Vision Foundation Models in Medical Image Analysis: Advances and Challenges

Pengchen Liang; Bin Pu; Haishan Huang; Yiwei Li; Hualiang Wang; Weibo Ma; Qing Chang

Vision Foundation Models in Medical Image Analysis: Advances and Challenges

Pengchen Liang, Bin Pu, Haishan Huang, Yiwei Li, Hualiang Wang, Weibo Ma, Qing Chang

TL;DR

Vision foundation models offer powerful long-range modeling for medical image segmentation but face domain gaps, data scarcity, and deployment constraints. The paper surveys adaptation methods for ViT and SAM in medicine, including adapter-based domain transfer, knowledge distillation, model compression for edge devices, and federated learning frameworks. It identifies bottlenecks in multi-scale context modeling, transfer semantics, and FL efficiency, and proposes direction toward theory-guided designs, privacy-preserving computation with compression, and cross-task collaborative learning. Collectively, these advances highlight the potential of foundation models to transform clinical workflows through scalable, privacy-conscious, and data-efficient medical imaging analytics.

Abstract

The rapid development of Vision Foundation Models (VFMs), particularly Vision Transformers (ViT) and Segment Anything Model (SAM), has sparked significant advances in the field of medical image analysis. These models have demonstrated exceptional capabilities in capturing long-range dependencies and achieving high generalization in segmentation tasks. However, adapting these large models to medical image analysis presents several challenges, including domain differences between medical and natural images, the need for efficient model adaptation strategies, and the limitations of small-scale medical datasets. This paper reviews the state-of-the-art research on the adaptation of VFMs to medical image segmentation, focusing on the challenges of domain adaptation, model compression, and federated learning. We discuss the latest developments in adapter-based improvements, knowledge distillation techniques, and multi-scale contextual feature modeling, and propose future directions to overcome these bottlenecks. Our analysis highlights the potential of VFMs, along with emerging methodologies such as federated learning and model compression, to revolutionize medical image analysis and enhance clinical applications. The goal of this work is to provide a comprehensive overview of current approaches and suggest key areas for future research that can drive the next wave of innovation in medical image segmentation.

Vision Foundation Models in Medical Image Analysis: Advances and Challenges

TL;DR

Abstract

Vision Foundation Models in Medical Image Analysis: Advances and Challenges

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)