Table of Contents
Fetching ...

Fractional Naive Bayes (FNB): non-convex optimization for a parsimonious weighted selective naive Bayes classifier

Carine Hue, Marc Boullé

TL;DR

This work presents Fractional Naive Bayes (FNB), a direct-weight optimization framework for high-dimensional Naïve Bayes classification that seeks parsimonious, robust models by sparsely regularizing the weight vector $w\in[0,1]^K$. It contrasts a prior-driven Boolean-weight averaging approach (SNB) with a non-convex, continuous-weight formulation, employing a two-stage optimization: a convex relaxation (e.g., SG.CF) followed by a local non-convex refinement, augmented by a sparse penalty based on costs $B(X_k)$ and a concave function $\xi_\delta$. Extensive experiments across 124 datasets demonstrate that SG.CF with FNB initialization achieves competitive predictive performance while markedly reducing the number of required variables, enabling scalable deployment and easier interpretation; FNB alone also matches SNB in accuracy with substantially greater sparsity. The approach is integrated into the Khiops AutoML framework, emphasizing practical benefits such as speed, interpretability, and deployment efficiency in large-scale, real-world settings.

Abstract

We study supervised classification for datasets with a very large number of input variables. The naïve Bayes classifier is attractive for its simplicity, scalability and effectiveness in many real data applications. When the strong naïve Bayes assumption of conditional independence of the input variables given the target variable is not valid, variable selection and model averaging are two common ways to improve the performance. In the case of the naïve Bayes classifier, the resulting weighting scheme on the models reduces to a weighting scheme on the variables. Here we focus on direct estimation of variable weights in such a weighted naïve Bayes classifier. We propose a sparse regularization of the model log-likelihood, which takes into account prior penalization costs related to each input variable. Compared to averaging based classifiers used up until now, our main goal is to obtain parsimonious robust models with less variables and equivalent performance. The direct estimation of the variable weights amounts to a non-convex optimization problem for which we propose and compare several two-stage algorithms. First, the criterion obtained by convex relaxation is minimized using several variants of standard gradient methods. Then, the initial non-convex optimization problem is solved using local optimization methods initialized with the result of the first stage. The various proposed algorithms result in optimization-based weighted naïve Bayes classifiers, that are evaluated on benchmark datasets and positioned w.r.t. to a reference averaging-based classifier.

Fractional Naive Bayes (FNB): non-convex optimization for a parsimonious weighted selective naive Bayes classifier

TL;DR

This work presents Fractional Naive Bayes (FNB), a direct-weight optimization framework for high-dimensional Naïve Bayes classification that seeks parsimonious, robust models by sparsely regularizing the weight vector . It contrasts a prior-driven Boolean-weight averaging approach (SNB) with a non-convex, continuous-weight formulation, employing a two-stage optimization: a convex relaxation (e.g., SG.CF) followed by a local non-convex refinement, augmented by a sparse penalty based on costs and a concave function . Extensive experiments across 124 datasets demonstrate that SG.CF with FNB initialization achieves competitive predictive performance while markedly reducing the number of required variables, enabling scalable deployment and easier interpretation; FNB alone also matches SNB in accuracy with substantially greater sparsity. The approach is integrated into the Khiops AutoML framework, emphasizing practical benefits such as speed, interpretability, and deployment efficiency in large-scale, real-world settings.

Abstract

We study supervised classification for datasets with a very large number of input variables. The naïve Bayes classifier is attractive for its simplicity, scalability and effectiveness in many real data applications. When the strong naïve Bayes assumption of conditional independence of the input variables given the target variable is not valid, variable selection and model averaging are two common ways to improve the performance. In the case of the naïve Bayes classifier, the resulting weighting scheme on the models reduces to a weighting scheme on the variables. Here we focus on direct estimation of variable weights in such a weighted naïve Bayes classifier. We propose a sparse regularization of the model log-likelihood, which takes into account prior penalization costs related to each input variable. Compared to averaging based classifiers used up until now, our main goal is to obtain parsimonious robust models with less variables and equivalent performance. The direct estimation of the variable weights amounts to a non-convex optimization problem for which we propose and compare several two-stage algorithms. First, the criterion obtained by convex relaxation is minimized using several variants of standard gradient methods. Then, the initial non-convex optimization problem is solved using local optimization methods initialized with the result of the first stage. The various proposed algorithms result in optimization-based weighted naïve Bayes classifiers, that are evaluated on benchmark datasets and positioned w.r.t. to a reference averaging-based classifier.
Paper Structure (18 sections, 16 equations, 11 figures, 3 tables, 2 algorithms)

This paper contains 18 sections, 16 equations, 11 figures, 3 tables, 2 algorithms.

Figures (11)

  • Figure 1: Final criterion value and final variable number for the eight methods
  • Figure 2: Final variable number for the eight methods using a log scale
  • Figure 3: Final criterion value, final variable number and execution time w.r.t their values for SG.CF method (excluding AM method)
  • Figure 4: Train and test AUC for SG.CF w.r.t SNB for different values of the regularization coefficient $\lambda$
  • Figure 5: Number of selected variables, test AUC and compression of SG.CF w.r.t SNB for different values of the regularization exponent $p$
  • ...and 6 more figures