Skewness-Guided Pruning of Multimodal Swin Transformers for Federated Skin Lesion Classification on Edge Devices
Kuniko Paxton, Koorosh Aslansefat, Dhavalkumar Thakker, Yiannis Papadopoulos
TL;DR
This work tackles the challenge of deploying high-performance multimodal skin lesion classifiers on privacy-constrained edge devices by introducing skewness-guided pruning of a Swin Transformer within a horizontal Federated Learning framework. By evaluating activation distribution skewness, the method structurally prunes MSA heads and MLP intermediates, performed server-side and refined through FL, achieving approximately 36% model-size reduction with negligible accuracy loss. The approach is demonstrated on HAM10000 with multimodal inputs, showing significant compute and memory savings while maintaining performance in distributed settings. The work highlights the practicality of edge-friendly, privacy-preserving multimodal medical AI and opens avenues for future improvements in FL aggregation and fusion techniques.
Abstract
In recent years, high-performance computer vision models have achieved remarkable success in medical imaging, with some skin lesion classification systems even surpassing dermatology specialists in diagnostic accuracy. However, such models are computationally intensive and large in size, making them unsuitable for deployment on edge devices. In addition, strict privacy constraints hinder centralized data management, motivating the adoption of Federated Learning (FL). To address these challenges, this study proposes a skewness-guided pruning method that selectively prunes the Multi-Head Self-Attention and Multi-Layer Perceptron layers of a multimodal Swin Transformer based on the statistical skewness of their output distributions. The proposed method was validated in a horizontal FL environment and shown to maintain performance while substantially reducing model complexity. Experiments on the compact Swin Transformer demonstrate approximately 36\% model size reduction with no loss in accuracy. These findings highlight the feasibility of achieving efficient model compression and privacy-preserving distributed learning for multimodal medical AI on edge devices.
