Table of Contents
Fetching ...

XNB: Explainable Class-Specific NaIve-Bayes Classifier

Jesus S. Aguilar-Ruiz, Cayetano Romero, Andrea Cicconardi

TL;DR

The Explainable Class-Specific Naive Bayes (XNB) classifier is presented, which introduces two critical innovations: the use of Kernel Density Estimation to calculate posterior probabilities, allowing for a more accurate and flexible estimation process, and the selection of class-specific feature subsets, ensuring that only the most relevant variables for each class are utilized.

Abstract

In today's data-intensive landscape, where high-dimensional datasets are increasingly common, reducing the number of input features is essential to prevent overfitting and improve model accuracy. Despite numerous efforts to tackle dimensionality reduction, most approaches apply a universal set of features across all classes, potentially missing the unique characteristics of individual classes. This paper presents the Explainable Class-Specific Naive Bayes (XNB) classifier, which introduces two critical innovations: 1) the use of Kernel Density Estimation to calculate posterior probabilities, allowing for a more accurate and flexible estimation process, and 2) the selection of class-specific feature subsets, ensuring that only the most relevant variables for each class are utilized. Extensive empirical analysis on high-dimensional genomic datasets shows that XNB matches the classification performance of traditional Naive Bayes while drastically improving model interpretability. By isolating the most relevant features for each class, XNB not only reduces the feature set to a minimal, distinct subset for each class but also provides deeper insights into how the model makes predictions. This approach offers significant advantages in fields where both precision and explainability are critical.

XNB: Explainable Class-Specific NaIve-Bayes Classifier

TL;DR

The Explainable Class-Specific Naive Bayes (XNB) classifier is presented, which introduces two critical innovations: the use of Kernel Density Estimation to calculate posterior probabilities, allowing for a more accurate and flexible estimation process, and the selection of class-specific feature subsets, ensuring that only the most relevant variables for each class are utilized.

Abstract

In today's data-intensive landscape, where high-dimensional datasets are increasingly common, reducing the number of input features is essential to prevent overfitting and improve model accuracy. Despite numerous efforts to tackle dimensionality reduction, most approaches apply a universal set of features across all classes, potentially missing the unique characteristics of individual classes. This paper presents the Explainable Class-Specific Naive Bayes (XNB) classifier, which introduces two critical innovations: 1) the use of Kernel Density Estimation to calculate posterior probabilities, allowing for a more accurate and flexible estimation process, and 2) the selection of class-specific feature subsets, ensuring that only the most relevant variables for each class are utilized. Extensive empirical analysis on high-dimensional genomic datasets shows that XNB matches the classification performance of traditional Naive Bayes while drastically improving model interpretability. By isolating the most relevant features for each class, XNB not only reduces the feature set to a minimal, distinct subset for each class but also provides deeper insights into how the model makes predictions. This approach offers significant advantages in fields where both precision and explainability are critical.

Paper Structure

This paper contains 14 sections, 12 equations, 1 figure, 2 tables, 1 algorithm.

Figures (1)

  • Figure 1: Comparative scheme of the methodology. Example of Brain GSE50161, with 54,675 variables and 5 classes. It is shown the result for one fold from stratified 10--fold cross--validation. Total number of unique variables identified after feature selection was 21 (class--independent approach). The number of variables for each class was #vars = [14, 10, 9, 7, 4], with a mean of 8.8 (class--specific approach). Accuracy for NB was 0.846 (using all the original variables), and for XNB was 1.000.