Table of Contents
Fetching ...

Texture Feature Analysis for Classification of Early-Stage Prostate Cancer in mpMRI

Asmail Muftah, S M Schirmer, Frank C Langbein

TL;DR

This study addresses the challenge of explainability in mpMRI-based prostate cancer classification by evaluating traditional machine learning models (Random Forest and SVM) on handcrafted radiomic features across T2W, ADC, and high-b DWI from ProstateX and an in-house early-stage dataset. Using Sequential Backward Floating Feature Selection, correlation analysis, and SHAP values, the authors identify a small subset of features—predominantly first-order statistics and a limited set of Haralick textures—that largely determine model outputs, with Local Binary Patterns contributing little. The results show that RF classifiers achieve the best AUC across datasets, and the same feature subsets recur across disparate cohorts, suggesting potential for streamlined, explainable classifiers. The findings highlight the value of explainable AI in radiomics and point to larger, diverse datasets to validate generalizability and to inform the design of robust, interpretable biomarkers for early-stage PCa identification.

Abstract

Magnetic resonance imaging (MRI) has become a crucial tool in the diagnosis and staging of prostate cancer, owing to its superior tissue contrast. However, it also creates large volumes of data that must be assessed by trained experts, a time-consuming and laborious task. This has prompted the development of machine learning tools for the automation of Prostate cancer (PCa) risk classification based on multiple MRI modalities (T2W, ADC, and high-b-value DWI). Understanding and interpreting the predictions made by the models, however, remains a challenge. We analyze Random Forests (RF) and Support Vector Machines (SVM), for two complementary datasets, the public Prostate-X dataset, and an in-house, mostly early-stage PCa dataset to elucidate the contributions made by first-order statistical features, Haralick texture features, and local binary patterns to the classification. Using correlation analysis and Shapley impact scores, we find that many of the features typically used are strongly correlated, and that the majority of features have negligible impact on the classification. We identify a small set of features that determine the classification outcome, which may aid the development of explainable AI approaches.

Texture Feature Analysis for Classification of Early-Stage Prostate Cancer in mpMRI

TL;DR

This study addresses the challenge of explainability in mpMRI-based prostate cancer classification by evaluating traditional machine learning models (Random Forest and SVM) on handcrafted radiomic features across T2W, ADC, and high-b DWI from ProstateX and an in-house early-stage dataset. Using Sequential Backward Floating Feature Selection, correlation analysis, and SHAP values, the authors identify a small subset of features—predominantly first-order statistics and a limited set of Haralick textures—that largely determine model outputs, with Local Binary Patterns contributing little. The results show that RF classifiers achieve the best AUC across datasets, and the same feature subsets recur across disparate cohorts, suggesting potential for streamlined, explainable classifiers. The findings highlight the value of explainable AI in radiomics and point to larger, diverse datasets to validate generalizability and to inform the design of robust, interpretable biomarkers for early-stage PCa identification.

Abstract

Magnetic resonance imaging (MRI) has become a crucial tool in the diagnosis and staging of prostate cancer, owing to its superior tissue contrast. However, it also creates large volumes of data that must be assessed by trained experts, a time-consuming and laborious task. This has prompted the development of machine learning tools for the automation of Prostate cancer (PCa) risk classification based on multiple MRI modalities (T2W, ADC, and high-b-value DWI). Understanding and interpreting the predictions made by the models, however, remains a challenge. We analyze Random Forests (RF) and Support Vector Machines (SVM), for two complementary datasets, the public Prostate-X dataset, and an in-house, mostly early-stage PCa dataset to elucidate the contributions made by first-order statistical features, Haralick texture features, and local binary patterns to the classification. Using correlation analysis and Shapley impact scores, we find that many of the features typically used are strongly correlated, and that the majority of features have negligible impact on the classification. We identify a small set of features that determine the classification outcome, which may aid the development of explainable AI approaches.
Paper Structure (9 sections, 8 figures)

This paper contains 9 sections, 8 figures.

Figures (8)

  • Figure 1: Traditional machine learning classification pipeline.
  • Figure 2: AUC metrics of the best classifiers for different types of traditional (RF and SVM with SBSF) and deep learning classifiers (CNNs trained on different combinations of inputs: T2W, T2W+ADC, T2W+ADC+DWI, T2W+DWI) for Prostate-X and in-house datasets show that traditional machine learning classifiers of RF type consistently perform best.
  • Figure 5: Feature value ranges for positive and negative patches differ but generally overlap for all datasets, as illustrated here for the mean values of the T2W patches. Left: Prostate-X, middle: in-house dataset, right: combined dataset. Red color indicates strong correlation in terms of Pearson $r$.
  • Figure 6: Feature value correlation matrix for the Prostate-X dataset. The strength of correlation is indicated by the color, with yellow indicating high correlation and blue effectively none. The plot shows that many features are correlated but there are groups of uncorrelated features. The white line corresponds to a feature (for high-b-value DWI) that could not be computed.
  • Figure 7: Shapley value distributions and decision trees for the best RF classifier with SBSF feature selection for the respective datasets.
  • ...and 3 more figures