Table of Contents
Fetching ...

ThyroidEffi 1.0: A Cost-Effective System for High-Performance Multi-Class Thyroid Carcinoma Classification

Hai Pham-Ngoc, De Nguyen-Van, Dung Vu-Tien, Phuong Le-Hong

TL;DR

ThyroidEffi 1.0 tackles the need for accurate, low-cost, multi-class thyroid FNAB image classification into three management-relevant groups. It combines YOLOv10-based cell-cluster detection, curriculum learning, a lightweight EfficientNetB0 backbone, and a Transformer-inspired multi-region module to achieve strong performance with minimal hardware demands. On internal data (n=1804) and external validation (n=1015), it reaches macro F1 around 89–89.8% and AUCs up to 0.98 for Benign, with real-time inference on standard CPUs, and provides Grad-CAM visualizations for interpretability. The system is deployed in Vietnam, demonstrating practical feasibility and layings groundwork for continuous learning and expansion to other cytopathology domains.

Abstract

Background: Automated classification of thyroid Fine Needle Aspiration Biopsy (FNAB) images faces challenges in limited data, inter-observer variability, and computational cost. Efficient, interpretable models are crucial for clinical support. Objective: To develop and externally validate a deep learning system for multi-class thyroid FNAB image classification into three key categories directly guiding post-biopsy treatment in Vietnam: Benign (Bethesda II), Indeterminate/Suspicious (BI, III, IV, V), and Malignant (BVI), achieving high diagnostic accuracy with low computational overhead. Methods: Our pipeline features: (1) YOLOv10 cell cluster detection for informative sub-region extraction/noise reduction; (2) curriculum learning sequencing localized crops to full images for multi-scale capture; (3) adaptive lightweight EfficientNetB0 (4M parameters) balancing performance/efficiency; and (4) a Transformer-inspired module for multi-scale/multi-region analysis. External validation used 1,015 independent FNAB images. Results: ThyroidEffi Basic achieved macro F1 of 89.19% and AUCs of 0.98 (Benign), 0.95 (Indeterminate/Suspicious), 0.96 (Malignant) on the internal test set. External validation yielded AUCs of 0.9495 (Benign), 0.7436 (Indeterminate/Suspicious), 0.8396 (Malignant). ThyroidEffi Premium improved macro F1 to 89.77%. Grad-CAM highlighted key diagnostic regions, confirming interpretability. The system processed 1000 cases in 30 seconds, demonstrating feasibility on widely accessible hardware. Conclusions: This work demonstrates that high-accuracy, interpretable thyroid FNAB image classification is achievable with minimal computational demands.

ThyroidEffi 1.0: A Cost-Effective System for High-Performance Multi-Class Thyroid Carcinoma Classification

TL;DR

ThyroidEffi 1.0 tackles the need for accurate, low-cost, multi-class thyroid FNAB image classification into three management-relevant groups. It combines YOLOv10-based cell-cluster detection, curriculum learning, a lightweight EfficientNetB0 backbone, and a Transformer-inspired multi-region module to achieve strong performance with minimal hardware demands. On internal data (n=1804) and external validation (n=1015), it reaches macro F1 around 89–89.8% and AUCs up to 0.98 for Benign, with real-time inference on standard CPUs, and provides Grad-CAM visualizations for interpretability. The system is deployed in Vietnam, demonstrating practical feasibility and layings groundwork for continuous learning and expansion to other cytopathology domains.

Abstract

Background: Automated classification of thyroid Fine Needle Aspiration Biopsy (FNAB) images faces challenges in limited data, inter-observer variability, and computational cost. Efficient, interpretable models are crucial for clinical support. Objective: To develop and externally validate a deep learning system for multi-class thyroid FNAB image classification into three key categories directly guiding post-biopsy treatment in Vietnam: Benign (Bethesda II), Indeterminate/Suspicious (BI, III, IV, V), and Malignant (BVI), achieving high diagnostic accuracy with low computational overhead. Methods: Our pipeline features: (1) YOLOv10 cell cluster detection for informative sub-region extraction/noise reduction; (2) curriculum learning sequencing localized crops to full images for multi-scale capture; (3) adaptive lightweight EfficientNetB0 (4M parameters) balancing performance/efficiency; and (4) a Transformer-inspired module for multi-scale/multi-region analysis. External validation used 1,015 independent FNAB images. Results: ThyroidEffi Basic achieved macro F1 of 89.19% and AUCs of 0.98 (Benign), 0.95 (Indeterminate/Suspicious), 0.96 (Malignant) on the internal test set. External validation yielded AUCs of 0.9495 (Benign), 0.7436 (Indeterminate/Suspicious), 0.8396 (Malignant). ThyroidEffi Premium improved macro F1 to 89.77%. Grad-CAM highlighted key diagnostic regions, confirming interpretability. The system processed 1000 cases in 30 seconds, demonstrating feasibility on widely accessible hardware. Conclusions: This work demonstrates that high-accuracy, interpretable thyroid FNAB image classification is achievable with minimal computational demands.

Paper Structure

This paper contains 26 sections, 5 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Graphical Abstract
  • Figure 2: Test results from the limited number of studies classifying more than two classes are presented. From left to right: the first or2_wsi and second or7_wsi are from the United States research group, and the third or13_wsi is from the China research group.
  • Figure 3: Workflow of thyroid nodule diagnosis, from clinical examination and ultrasound-guided biopsy to slide image analysis (our research scope).
  • Figure 4: Visual overview of our methodology, encompassing data pre-processing, model training with novel techniques (M1-M4). To avoid any misunderstanding, it is crucial to note that the techniques described in M1 were exclusively applied to the training dataset, consequently meaning that M2's intended function is solely relevant during the training phase. The internal validation, test, and external validation datasets were strictly reserved for their designated purposes: preventing model overfitting (internal validation) and evaluating the trained model's performance (test and external validation). Following model development and testing before July 2024 at 108 Military Central Hospital, the model entered a deployment phase with extended testing on an independent dataset at Hung Viet Hospital.
  • Figure 5: Visualization of data distribution before and after applying the ThyroidEffi Basic model for classifying , , and categories: - Top row: Original dataset reduced to 2D using Principal Component Analysis () dr_PCA, t-distributed Stochastic Neighbor Embedding (), and Uniform Manifold Approximation and Projection () dr_tSNE_UMAP, showing overlapping clusters and poor separability among classes. - Bottom row: Dataset transformed by ThyroidEffi Basic into a 3-dimensional latent space (representing probabilities for , , ) and reduced to 2D. Clear cluster separation highlights the model’s ability to extract discriminative features.
  • ...and 1 more figures