Texture Feature Analysis for Classification of Early-Stage Prostate Cancer in mpMRI
Asmail Muftah, S M Schirmer, Frank C Langbein
TL;DR
This study addresses the challenge of explainability in mpMRI-based prostate cancer classification by evaluating traditional machine learning models (Random Forest and SVM) on handcrafted radiomic features across T2W, ADC, and high-b DWI from ProstateX and an in-house early-stage dataset. Using Sequential Backward Floating Feature Selection, correlation analysis, and SHAP values, the authors identify a small subset of features—predominantly first-order statistics and a limited set of Haralick textures—that largely determine model outputs, with Local Binary Patterns contributing little. The results show that RF classifiers achieve the best AUC across datasets, and the same feature subsets recur across disparate cohorts, suggesting potential for streamlined, explainable classifiers. The findings highlight the value of explainable AI in radiomics and point to larger, diverse datasets to validate generalizability and to inform the design of robust, interpretable biomarkers for early-stage PCa identification.
Abstract
Magnetic resonance imaging (MRI) has become a crucial tool in the diagnosis and staging of prostate cancer, owing to its superior tissue contrast. However, it also creates large volumes of data that must be assessed by trained experts, a time-consuming and laborious task. This has prompted the development of machine learning tools for the automation of Prostate cancer (PCa) risk classification based on multiple MRI modalities (T2W, ADC, and high-b-value DWI). Understanding and interpreting the predictions made by the models, however, remains a challenge. We analyze Random Forests (RF) and Support Vector Machines (SVM), for two complementary datasets, the public Prostate-X dataset, and an in-house, mostly early-stage PCa dataset to elucidate the contributions made by first-order statistical features, Haralick texture features, and local binary patterns to the classification. Using correlation analysis and Shapley impact scores, we find that many of the features typically used are strongly correlated, and that the majority of features have negligible impact on the classification. We identify a small set of features that determine the classification outcome, which may aid the development of explainable AI approaches.
