Quantization-Aware Neuromorphic Architecture for Efficient Skin Disease Classification on Resource-Constrained Devices
Haitian Wang, Xinyu Wang, Yiren Wang, Zichen Geng, Xian Zhang, Yu Zhang, Bo Miao
TL;DR
The paper tackles the challenge of accurate skin lesion classification on privacy-sensitive, resource-constrained edge devices. It introduces QANA, a quantization-aware neuromorphic framework that unifies Ghost-based feature extraction, SA-ECA attention, SE recalibration, and spike-compatible transformations to enable CNN-to-SNN deployment on BrainChip Akida. Empirical results on HAM10000 and a clinical dataset show superior accuracy and dramatic reductions in latency and energy compared with GPU-based CNNs and prior CNN-to-SNN methods, including 1.5 ms per image and 1.7 mJ per image on Akida. The approach demonstrates practical viability for real-time, on-device dermatology with incremental learning capabilities and robust performance on imbalanced data, enhancing privacy-preserving medical inference at the edge.
Abstract
Accurate and efficient skin lesion classification on edge devices is critical for accessible dermatological care but remains challenging due to computational, energy, and privacy constraints. We introduce QANA, a novel quantization-aware neuromorphic architecture for incremental skin lesion classification on resource-limited hardware. QANA effectively integrates ghost modules, efficient channel attention, and squeeze-and-excitation blocks for robust feature representation with low-latency and energy-efficient inference. Its quantization-aware head and spike-compatible transformations enable seamless conversion to spiking neural networks (SNNs) and deployment on neuromorphic platforms. Evaluation on the large-scale HAM10000 benchmark and a real-world clinical dataset shows that QANA achieves 91.6% Top-1 accuracy and 82.4% macro F1 on HAM10000, and 90.8%/81.7% on the clinical dataset, significantly outperforming state-of-the-art CNN-to-SNN models under fair comparison. Deployed on BrainChip Akida hardware, QANA achieves 1.5 ms inference latency and 1.7,mJ energy per image, reducing inference latency and energy use by over 94.6%/98.6% compared to GPU-based CNNs surpassing state-of-the-art CNN-to-SNN conversion baselines. These results demonstrate the effectiveness of QANA for accurate, real-time, and privacy-sensitive medical analysis in edge environments.
