Table of Contents
Fetching ...

Addressing Challenges in Data Quality and Model Generalization for Malaria Detection

Kiswendsida Kisito Kabore, Desire Guel

TL;DR

The study analyzes data quality and model generalization challenges in AI-driven malaria detection, identifying imbalanced data, limited diversity, annotation variability, and regional biases as core obstacles. It advocates a multifaceted approach—GAN-based augmentation, transfer learning, domain adaptation, cross-validation on diverse datasets, and global collaborative datasets—to bolster robustness across varied populations and settings. The work highlights the importance of explainable AI and mobile deployment to enable trusted, scalable diagnostics in resource-limited regions. Collectively, these contributions offer a practical roadmap for developing equitable, accurate malaria diagnostics that translate from research to real-world impact.

Abstract

Malaria remains a significant global health burden, particularly in resource-limited regions where timely and accurate diagnosis is critical to effective treatment and control. Deep Learning (DL) has emerged as a transformative tool for automating malaria detection and it offers high accuracy and scalability. However, the effectiveness of these models is constrained by challenges in data quality and model generalization including imbalanced datasets, limited diversity and annotation variability. These issues reduce diagnostic reliability and hinder real-world applicability. This article provides a comprehensive analysis of these challenges and their implications for malaria detection performance. Key findings highlight the impact of data imbalances which can lead to a 20\% drop in F1-score and regional biases which significantly hinder model generalization. Proposed solutions, such as GAN-based augmentation, improved accuracy by 15-20\% by generating synthetic data to balance classes and enhance dataset diversity. Domain adaptation techniques, including transfer learning, further improved cross-domain robustness by up to 25\% in sensitivity. Additionally, the development of diverse global datasets and collaborative data-sharing frameworks is emphasized as a cornerstone for equitable and reliable malaria diagnostics. The role of explainable AI techniques in improving clinical adoption and trustworthiness is also underscored. By addressing these challenges, this work advances the field of AI-driven malaria detection and provides actionable insights for researchers and practitioners. The proposed solutions aim to support the development of accessible and accurate diagnostic tools, particularly for resource-constrained populations.

Addressing Challenges in Data Quality and Model Generalization for Malaria Detection

TL;DR

The study analyzes data quality and model generalization challenges in AI-driven malaria detection, identifying imbalanced data, limited diversity, annotation variability, and regional biases as core obstacles. It advocates a multifaceted approach—GAN-based augmentation, transfer learning, domain adaptation, cross-validation on diverse datasets, and global collaborative datasets—to bolster robustness across varied populations and settings. The work highlights the importance of explainable AI and mobile deployment to enable trusted, scalable diagnostics in resource-limited regions. Collectively, these contributions offer a practical roadmap for developing equitable, accurate malaria diagnostics that translate from research to real-world impact.

Abstract

Malaria remains a significant global health burden, particularly in resource-limited regions where timely and accurate diagnosis is critical to effective treatment and control. Deep Learning (DL) has emerged as a transformative tool for automating malaria detection and it offers high accuracy and scalability. However, the effectiveness of these models is constrained by challenges in data quality and model generalization including imbalanced datasets, limited diversity and annotation variability. These issues reduce diagnostic reliability and hinder real-world applicability. This article provides a comprehensive analysis of these challenges and their implications for malaria detection performance. Key findings highlight the impact of data imbalances which can lead to a 20\% drop in F1-score and regional biases which significantly hinder model generalization. Proposed solutions, such as GAN-based augmentation, improved accuracy by 15-20\% by generating synthetic data to balance classes and enhance dataset diversity. Domain adaptation techniques, including transfer learning, further improved cross-domain robustness by up to 25\% in sensitivity. Additionally, the development of diverse global datasets and collaborative data-sharing frameworks is emphasized as a cornerstone for equitable and reliable malaria diagnostics. The role of explainable AI techniques in improving clinical adoption and trustworthiness is also underscored. By addressing these challenges, this work advances the field of AI-driven malaria detection and provides actionable insights for researchers and practitioners. The proposed solutions aim to support the development of accessible and accurate diagnostic tools, particularly for resource-constrained populations.
Paper Structure (21 sections, 17 figures, 17 tables)

This paper contains 21 sections, 17 figures, 17 tables.

Figures (17)

  • Figure 1: Workflow of data quality challenges and solutions.
  • Figure 2: Data preprocessing pipeline for addressing data quality issues.
  • Figure 3: Class distribution in a typical malaria dataset.
  • Figure 4: Performance Metrics Comparison Across Different Dataset Types.
  • Figure 5: Factors Contributing to Dataset Diversity
  • ...and 12 more figures