Table of Contents
Fetching ...

Quantization-Aware Neuromorphic Architecture for Efficient Skin Disease Classification on Resource-Constrained Devices

Haitian Wang, Xinyu Wang, Yiren Wang, Zichen Geng, Xian Zhang, Yu Zhang, Bo Miao

TL;DR

The paper tackles the challenge of accurate skin lesion classification on privacy-sensitive, resource-constrained edge devices. It introduces QANA, a quantization-aware neuromorphic framework that unifies Ghost-based feature extraction, SA-ECA attention, SE recalibration, and spike-compatible transformations to enable CNN-to-SNN deployment on BrainChip Akida. Empirical results on HAM10000 and a clinical dataset show superior accuracy and dramatic reductions in latency and energy compared with GPU-based CNNs and prior CNN-to-SNN methods, including 1.5 ms per image and 1.7 mJ per image on Akida. The approach demonstrates practical viability for real-time, on-device dermatology with incremental learning capabilities and robust performance on imbalanced data, enhancing privacy-preserving medical inference at the edge.

Abstract

Accurate and efficient skin lesion classification on edge devices is critical for accessible dermatological care but remains challenging due to computational, energy, and privacy constraints. We introduce QANA, a novel quantization-aware neuromorphic architecture for incremental skin lesion classification on resource-limited hardware. QANA effectively integrates ghost modules, efficient channel attention, and squeeze-and-excitation blocks for robust feature representation with low-latency and energy-efficient inference. Its quantization-aware head and spike-compatible transformations enable seamless conversion to spiking neural networks (SNNs) and deployment on neuromorphic platforms. Evaluation on the large-scale HAM10000 benchmark and a real-world clinical dataset shows that QANA achieves 91.6% Top-1 accuracy and 82.4% macro F1 on HAM10000, and 90.8%/81.7% on the clinical dataset, significantly outperforming state-of-the-art CNN-to-SNN models under fair comparison. Deployed on BrainChip Akida hardware, QANA achieves 1.5 ms inference latency and 1.7,mJ energy per image, reducing inference latency and energy use by over 94.6%/98.6% compared to GPU-based CNNs surpassing state-of-the-art CNN-to-SNN conversion baselines. These results demonstrate the effectiveness of QANA for accurate, real-time, and privacy-sensitive medical analysis in edge environments.

Quantization-Aware Neuromorphic Architecture for Efficient Skin Disease Classification on Resource-Constrained Devices

TL;DR

The paper tackles the challenge of accurate skin lesion classification on privacy-sensitive, resource-constrained edge devices. It introduces QANA, a quantization-aware neuromorphic framework that unifies Ghost-based feature extraction, SA-ECA attention, SE recalibration, and spike-compatible transformations to enable CNN-to-SNN deployment on BrainChip Akida. Empirical results on HAM10000 and a clinical dataset show superior accuracy and dramatic reductions in latency and energy compared with GPU-based CNNs and prior CNN-to-SNN methods, including 1.5 ms per image and 1.7 mJ per image on Akida. The approach demonstrates practical viability for real-time, on-device dermatology with incremental learning capabilities and robust performance on imbalanced data, enhancing privacy-preserving medical inference at the edge.

Abstract

Accurate and efficient skin lesion classification on edge devices is critical for accessible dermatological care but remains challenging due to computational, energy, and privacy constraints. We introduce QANA, a novel quantization-aware neuromorphic architecture for incremental skin lesion classification on resource-limited hardware. QANA effectively integrates ghost modules, efficient channel attention, and squeeze-and-excitation blocks for robust feature representation with low-latency and energy-efficient inference. Its quantization-aware head and spike-compatible transformations enable seamless conversion to spiking neural networks (SNNs) and deployment on neuromorphic platforms. Evaluation on the large-scale HAM10000 benchmark and a real-world clinical dataset shows that QANA achieves 91.6% Top-1 accuracy and 82.4% macro F1 on HAM10000, and 90.8%/81.7% on the clinical dataset, significantly outperforming state-of-the-art CNN-to-SNN models under fair comparison. Deployed on BrainChip Akida hardware, QANA achieves 1.5 ms inference latency and 1.7,mJ energy per image, reducing inference latency and energy use by over 94.6%/98.6% compared to GPU-based CNNs surpassing state-of-the-art CNN-to-SNN conversion baselines. These results demonstrate the effectiveness of QANA for accurate, real-time, and privacy-sensitive medical analysis in edge environments.

Paper Structure

This paper contains 31 sections, 18 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Detailed architecture of our end-to-end framework for quantization-aware neuromorphic skin lesion classification: (1) data preprocessing (quality filtering, augmentation, and SMOTE-based oversampling); (2) a novel quantization-aware network for feature extraction and spike-compatible transformation; (3) CNN-to-SNN conversion with operator mapping and temporal spike encoding; and (4) SNN deployment with on-chip optimization for real-time and energy-efficient inference on edge hardware.
  • Figure 2: Detailed architecture of our QANA, which performs iterative feature extraction using stacked Ghost modules, ECA, and residual blocks, followed by spike-compatible transformation with batch normalization, ReLU activation, and Squeeze-and-Excitation (SE) block. The output is then quantized and projected to class logits for SNN deployment.
  • Figure 3: Schematic of the Ghost module. The input feature map is first processed by a lightweight convolution to extract a reduced set of primary features with channel size $\mu C$, where $C$ is the target output dimensionality and $\mu \in (0,1)$ is a tunable ratio. Subsequently, inexpensive operations are applied to the primary features to generate additional ghost features of size $(1-\mu)C$. These are concatenated along the channel axis to form the final output of size $C$.
  • Figure 4: Illustration of the Spatially-Aware ECA (SA-ECA) block. A depthwise convolution is first applied to extract channel-wise statistics, followed by a lightweight 1D convolution to model local channel dependencies. The resulting attention weights are used to rescale the input feature channels, enhancing discriminative information with minimal computational overhead.
  • Figure 5: Illustration of the Squeeze-and-Excitation (SE) block. The input feature map undergoes global pooling, followed by two fully connected layers with ReLU and sigmoid activations to compute channel-wise weights. The original feature map is then rescaled by these weights, enabling adaptive recalibration of channel responses.