A Deep Learning Framework for Thyroid Nodule Segmentation and Malignancy Classification from Ultrasound Images
Omar Abdelrazik, Mohamed Elsayed, Noorul Wahab, Nasir Rajpoot, Adam Shephard
TL;DR
The paper tackles inter-observer variability in ultrasound-based thyroid nodule risk assessment by introducing a fully automated two-stage framework that segments nodules with TransUNet and then classifies malignancy from a region-of-interest crop using ResNet-18, enabling interpretable predictions focused on the nodule. It benchmarks against a handcrafted-feature baseline to show that implicit, region-focused deep features offer superior performance (F1 ≈ 0.852 vs. 0.829) on a 349-image DDTI dataset, with external validation on the TNUI cohort. Key contributions include an end-to-end pipeline for nodule detection and malignancy prediction, and a direct comparison between implicit DL features and explicit morphology features. The approach demonstrates practical potential for trustworthy AI-assisted thyroid nodule assessment, while acknowledging segmentation accuracy and dataset imbalance as key factors for further improvement.
Abstract
Ultrasound-based risk stratification of thyroid nodules is a critical clinical task, but it suffers from high inter-observer variability. While many deep learning (DL) models function as "black boxes," we propose a fully automated, two-stage framework for interpretable malignancy prediction. Our method achieves interpretability by forcing the model to focus only on clinically relevant regions. First, a TransUNet model automatically segments the thyroid nodule. The resulting mask is then used to create a region of interest around the nodule, and this localised image is fed directly into a ResNet-18 classifier. We evaluated our framework using 5-fold cross-validation on a clinical dataset of 349 images, where it achieved a high F1-score of 0.852 for predicting malignancy. To validate its performance, we compared it against a strong baseline using a Random Forest classifier with hand-crafted morphological features, which achieved an F1-score of 0.829. The superior performance of our DL framework suggests that the implicit visual features learned from the localised nodule are more predictive than explicit shape features alone. This is the first fully automated end-to-end pipeline for both detecting thyroid nodules on ultrasound images and predicting their malignancy.
