CACTUS: a Comprehensive Abstraction and Classification Tool for Uncovering Structures
Luca Gherardini, Varun Ravi Varma, Karol Capala, Roger Woods, Jose Sousa
TL;DR
CACTUS addresses explainability and data-scarcity challenges by extending SaNDA with categorical attribute abstractions, memory-efficient on-the-fly graphs, and parallelizable pipelines. It offers two classification modes (PageRank-based and probabilistic SaNDA) and delivers outputs including knowledge graphs, binary decision trees, and correlation analyses to reveal how attributes drive class separation, validated on the WDBC and Thyroid datasets. The approach achieves competitive balanced accuracy while delivering rich interpretability through marker distributions, centrality measures, and graph communities, illustrating the value of category-preserving abstractions for medical and other data domains. Overall, CACTUS demonstrates how structured abstractions and graph-based reasoning can enable secure, explainable analytics with practical impact on small-to-moderate datasets.
Abstract
The availability of large data sets is providing an impetus for driving current artificial intelligent developments. There are, however, challenges for developing solutions with small data sets due to practical and cost-effective deployment and the opacity of deep learning models. The Comprehensive Abstraction and Classification Tool for Uncovering Structures called CACTUS is presented for improved secure analytics by effectively employing explainable artificial intelligence. It provides additional support for categorical attributes, preserving their original meaning, optimising memory usage, and speeding up the computation through parallelisation. It shows to the user the frequency of the attributes in each class and ranks them by their discriminative power. Its performance is assessed by application to the Wisconsin diagnostic breast cancer and Thyroid0387 data sets.
