Table of Contents
Fetching ...

CACTUS: a Comprehensive Abstraction and Classification Tool for Uncovering Structures

Luca Gherardini, Varun Ravi Varma, Karol Capala, Roger Woods, Jose Sousa

TL;DR

CACTUS addresses explainability and data-scarcity challenges by extending SaNDA with categorical attribute abstractions, memory-efficient on-the-fly graphs, and parallelizable pipelines. It offers two classification modes (PageRank-based and probabilistic SaNDA) and delivers outputs including knowledge graphs, binary decision trees, and correlation analyses to reveal how attributes drive class separation, validated on the WDBC and Thyroid datasets. The approach achieves competitive balanced accuracy while delivering rich interpretability through marker distributions, centrality measures, and graph communities, illustrating the value of category-preserving abstractions for medical and other data domains. Overall, CACTUS demonstrates how structured abstractions and graph-based reasoning can enable secure, explainable analytics with practical impact on small-to-moderate datasets.

Abstract

The availability of large data sets is providing an impetus for driving current artificial intelligent developments. There are, however, challenges for developing solutions with small data sets due to practical and cost-effective deployment and the opacity of deep learning models. The Comprehensive Abstraction and Classification Tool for Uncovering Structures called CACTUS is presented for improved secure analytics by effectively employing explainable artificial intelligence. It provides additional support for categorical attributes, preserving their original meaning, optimising memory usage, and speeding up the computation through parallelisation. It shows to the user the frequency of the attributes in each class and ranks them by their discriminative power. Its performance is assessed by application to the Wisconsin diagnostic breast cancer and Thyroid0387 data sets.

CACTUS: a Comprehensive Abstraction and Classification Tool for Uncovering Structures

TL;DR

CACTUS addresses explainability and data-scarcity challenges by extending SaNDA with categorical attribute abstractions, memory-efficient on-the-fly graphs, and parallelizable pipelines. It offers two classification modes (PageRank-based and probabilistic SaNDA) and delivers outputs including knowledge graphs, binary decision trees, and correlation analyses to reveal how attributes drive class separation, validated on the WDBC and Thyroid datasets. The approach achieves competitive balanced accuracy while delivering rich interpretability through marker distributions, centrality measures, and graph communities, illustrating the value of category-preserving abstractions for medical and other data domains. Overall, CACTUS demonstrates how structured abstractions and graph-based reasoning can enable secure, explainable analytics with practical impact on small-to-moderate datasets.

Abstract

The availability of large data sets is providing an impetus for driving current artificial intelligent developments. There are, however, challenges for developing solutions with small data sets due to practical and cost-effective deployment and the opacity of deep learning models. The Comprehensive Abstraction and Classification Tool for Uncovering Structures called CACTUS is presented for improved secure analytics by effectively employing explainable artificial intelligence. It provides additional support for categorical attributes, preserving their original meaning, optimising memory usage, and speeding up the computation through parallelisation. It shows to the user the frequency of the attributes in each class and ranks them by their discriminative power. Its performance is assessed by application to the Wisconsin diagnostic breast cancer and Thyroid0387 data sets.
Paper Structure (9 sections, 3 equations, 15 figures, 3 tables)

This paper contains 9 sections, 3 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Transitions in the probability of each marker in the WDBC across benign (0) and malignant (1) cancer. The title of each plot contains the average rank $\Bar{R_x}$ computed through Equation \ref{['eq-avg-rank']}.
  • Figure 2: Correlation matrix between the flips of the WDBC.
  • Figure 3: Knowledge graph for benign breast cancers. The node colours denote the corrected PageRank significance, while the edges are colored in different palettes depending on the kind of flips they are linking: green for Up flips, red for Down flips, and plasma for categorical flips.
  • Figure 4: Knowledge graph for malignant breast cancers. The node colours denote the corrected PageRank significance, while the edges are colored in different palettes depending on the kind of flips they are linking: green for Up flips, red for Down flips, and plasma for categorical flips.
  • Figure 5: Communities obtained through the Greedy algorithm in benign (left) and malignant (right) breast cancers. The blue partition in the benign graph is fully contained in the red partition of the malignant but the latter comprises additional flips. Specularly, the blue community in the malignant graph is contained in the red community in the benign graph. The members changing membership result from the dynamics distinguishing malignant and benign breast cancer.
  • ...and 10 more figures