Table of Contents
Fetching ...

Thyroidiomics: An Automated Pipeline for Segmentation and Classification of Thyroid Pathologies from Scintigraphy Images

Maziar Sabouri, Shadab Ahamed, Azin Asadzadeh, Atlas Haddadi Avval, Soroush Bagheri, Mohsen Arabi, Seyed Rasoul Zakavi, Emran Askari, Ali Rasouli, Atena Aghaee, Mohaddese Sehati, Fereshteh Yousefirizi, Carlos Uribe, Ghasem Hajianfar, Habib Zaidi, Arman Rahmim

TL;DR

Thyroidiomics tackles automated thyroid pathology classification from scintigraphy by integrating ResUNet-based segmentation with radiomics-driven classification in a two-step pipeline. The study compares a physician-delimited ROI approach (scenario 1) against a fully automated approach using ResUNet ROIs (scenario 2) under leave-one-center-out cross-validation across nine centers. Results show robust segmentation and high classification performance in both scenarios, with TH class consistently strong and the automated pipeline achieving comparable metrics to physician-based methods. The work highlights potential for faster, more reproducible thyroid assessment and lays groundwork for future multimodal extensions and segmentation-free classification strategies.

Abstract

The objective of this study was to develop an automated pipeline that enhances thyroid disease classification using thyroid scintigraphy images, aiming to decrease assessment time and increase diagnostic accuracy. Anterior thyroid scintigraphy images from 2,643 patients were collected and categorized into diffuse goiter (DG), multinodal goiter (MNG), and thyroiditis (TH) based on clinical reports, and then segmented by an expert. A ResUNet model was trained to perform auto-segmentation. Radiomic features were extracted from both physician (scenario 1) and ResUNet segmentations (scenario 2), followed by omitting highly correlated features using Spearman's correlation, and feature selection using Recursive Feature Elimination (RFE) with XGBoost as the core. All models were trained under leave-one-center-out cross-validation (LOCOCV) scheme, where nine instances of algorithms were iteratively trained and validated on data from eight centers and tested on the ninth for both scenarios separately. Segmentation performance was assessed using the Dice similarity coefficient (DSC), while classification performance was assessed using metrics, such as precision, recall, F1-score, accuracy, area under the Receiver Operating Characteristic (ROC AUC), and area under the precision-recall curve (PRC AUC). ResUNet achieved DSC values of 0.84$\pm$0.03, 0.71$\pm$0.06, and 0.86$\pm$0.02 for MNG, TH, and DG, respectively. Classification in scenario 1 achieved an accuracy of 0.76$\pm$0.04 and a ROC AUC of 0.92$\pm$0.02 while in scenario 2, classification yielded an accuracy of 0.74$\pm$0.05 and a ROC AUC of 0.90$\pm$0.02. The automated pipeline demonstrated comparable performance to physician segmentations on several classification metrics across different classes, effectively reducing assessment time while maintaining high diagnostic accuracy. Code available at: https://github.com/ahxmeds/thyroidiomics.git.

Thyroidiomics: An Automated Pipeline for Segmentation and Classification of Thyroid Pathologies from Scintigraphy Images

TL;DR

Thyroidiomics tackles automated thyroid pathology classification from scintigraphy by integrating ResUNet-based segmentation with radiomics-driven classification in a two-step pipeline. The study compares a physician-delimited ROI approach (scenario 1) against a fully automated approach using ResUNet ROIs (scenario 2) under leave-one-center-out cross-validation across nine centers. Results show robust segmentation and high classification performance in both scenarios, with TH class consistently strong and the automated pipeline achieving comparable metrics to physician-based methods. The work highlights potential for faster, more reproducible thyroid assessment and lays groundwork for future multimodal extensions and segmentation-free classification strategies.

Abstract

The objective of this study was to develop an automated pipeline that enhances thyroid disease classification using thyroid scintigraphy images, aiming to decrease assessment time and increase diagnostic accuracy. Anterior thyroid scintigraphy images from 2,643 patients were collected and categorized into diffuse goiter (DG), multinodal goiter (MNG), and thyroiditis (TH) based on clinical reports, and then segmented by an expert. A ResUNet model was trained to perform auto-segmentation. Radiomic features were extracted from both physician (scenario 1) and ResUNet segmentations (scenario 2), followed by omitting highly correlated features using Spearman's correlation, and feature selection using Recursive Feature Elimination (RFE) with XGBoost as the core. All models were trained under leave-one-center-out cross-validation (LOCOCV) scheme, where nine instances of algorithms were iteratively trained and validated on data from eight centers and tested on the ninth for both scenarios separately. Segmentation performance was assessed using the Dice similarity coefficient (DSC), while classification performance was assessed using metrics, such as precision, recall, F1-score, accuracy, area under the Receiver Operating Characteristic (ROC AUC), and area under the precision-recall curve (PRC AUC). ResUNet achieved DSC values of 0.840.03, 0.710.06, and 0.860.02 for MNG, TH, and DG, respectively. Classification in scenario 1 achieved an accuracy of 0.760.04 and a ROC AUC of 0.920.02 while in scenario 2, classification yielded an accuracy of 0.740.05 and a ROC AUC of 0.900.02. The automated pipeline demonstrated comparable performance to physician segmentations on several classification metrics across different classes, effectively reducing assessment time while maintaining high diagnostic accuracy. Code available at: https://github.com/ahxmeds/thyroidiomics.git.
Paper Structure (12 sections, 4 figures, 2 tables)

This paper contains 12 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Thyroidiomics: the proposed two-step pipeline to classify thyroid pathologies into three classes, namely, MNG, TH and DG. Scenario 1 represents the pipeline dependent on physician delineated ROIs as input to the classifier, while scenario 2 represents the fully automated pipeline operating on segmentation predicted by ResUNet.
  • Figure 2: Various class-wise and averaged metrics for classification were used to evaluate model performance in two scenarios: features extracted from the physician delineated ROIs and those from ResUNet predicted ROIs. The boxplots show the distribution of metrics over the nine centers as test sets for the three thyroid pathology classes, MNG, TH and DG. The black horizontal lines denote the median and white circle denote the mean of distribution.
  • Figure 3: (a) Distribution of center-level mean DSC over 9 centers for the classes, MNG, TH and DG. (b)-(d), (e)-(g), and (h)-(j) show some representative images from each class with the ground truth (red) and ResUNet predicted (yellow) segmentation of thyroid. The DSC between ground truth and predicted masks is shown in the bottom-right of each figure.
  • Figure 4: Distribution of counts within the ground truth thyroid masks for the three thyroid pathology classes, MNG, TH and DG.