AnoMalNet: Outlier Detection based Malaria Cell Image Classification Method Leveraging Deep Autoencoder
Aminul Huq, Md Tanzim Reza, Shahriar Hossain, Shakib Mahmud Dipto
TL;DR
The paper addresses binary malaria parasite detection in cell images under extreme class imbalance. It proposes AnoMalNet, a convolutional autoencoder trained only on uninfected cells, using reconstruction-loss thresholding (threshold = $\mu + 3\sigma$) to label images as infected. On a NIH malaria cell image dataset, AnoMalNet outperforms large CNNs with accuracy 98.49%, precision 97.07%, recall 100%, and F1 98.52, demonstrating the effectiveness of outlier-based detection in imbalanced scenarios. This approach offers a practical alternative when disease-positive samples are scarce and has potential for deployment on edge devices due to its lighter-weight training paradigm and reliance on normal-class data during training.
Abstract
Class imbalance is a pervasive issue in the field of disease classification from medical images. It is necessary to balance out the class distribution while training a model for decent results. However, in the case of rare medical diseases, images from affected patients are much harder to come by compared to images from non-affected patients, resulting in unwanted class imbalance. Various processes of tackling class imbalance issues have been explored so far, each having its fair share of drawbacks. In this research, we propose an outlier detection based binary medical image classification technique which can handle even the most extreme case of class imbalance. We have utilized a dataset of malaria parasitized and uninfected cells. An autoencoder model titled AnoMalNet is trained with only the uninfected cell images at the beginning and then used to classify both the affected and non-affected cell images by thresholding a loss value. We have achieved an accuracy, precision, recall, and F1 score of 98.49%, 97.07%, 100%, and 98.52% respectively, performing better than large deep learning models and other published works. As our proposed approach can provide competitive results without needing the disease-positive samples during training, it should prove to be useful in binary disease classification on imbalanced datasets.
