Table of Contents
Fetching ...

Efficient Parameter Estimation for Bayesian Network Classifiers using Hierarchical Linear Smoothing

Connor Cooper, Geoffrey I. Webb, Daniel F. Schmidt

TL;DR

This work addresses the limited performance and rigidity of traditional Bayesian network classifiers by introducing Hierarchical Linear Smoothing (HLS), a log-linear model that approximates HDP-based smoothing of conditional probability tables. HLS constructs a design matrix from the CPT tree and learns CPT parameters jointly via multinomial logistic regression, using either standard ridge or Bayesian global-local shrinkage regularization, with efficient Pólya-Gamma augmentation for Bayesian variants. Empirically, HLS achieves competitive or superior performance compared with HDP smoothing and additive smoothing, and can outperform random forests on log loss while offering significant speedups over HDP. The approach preserves the interpretability and scalability of BNCs and provides a flexible framework for extending smoothing via linear-model theory, suggesting practical applicability for large categorical datasets and future extensions to larger cardinalities and continuous features.

Abstract

Bayesian network classifiers (BNCs) possess a number of properties desirable for a modern classifier: They are easily interpretable, highly scalable, and offer adaptable complexity. However, traditional methods for learning BNCs have historically underperformed when compared to leading classification methods such as random forests. Recent parameter smoothing techniques using hierarchical Dirichlet processes (HDPs) have enabled BNCs to achieve performance competitive with random forests on categorical data, but these techniques are relatively inflexible, and require a complicated, specialized sampling process. In this paper, we introduce a novel method for parameter estimation that uses a log-linear regression to approximate the behaviour of HDPs. As a linear model, our method is remarkably flexible and simple to interpret, and can leverage the vast literature on learning linear models. Our experiments show that our method can outperform HDP smoothing while being orders of magnitude faster, remaining competitive with random forests on categorical data.

Efficient Parameter Estimation for Bayesian Network Classifiers using Hierarchical Linear Smoothing

TL;DR

This work addresses the limited performance and rigidity of traditional Bayesian network classifiers by introducing Hierarchical Linear Smoothing (HLS), a log-linear model that approximates HDP-based smoothing of conditional probability tables. HLS constructs a design matrix from the CPT tree and learns CPT parameters jointly via multinomial logistic regression, using either standard ridge or Bayesian global-local shrinkage regularization, with efficient Pólya-Gamma augmentation for Bayesian variants. Empirically, HLS achieves competitive or superior performance compared with HDP smoothing and additive smoothing, and can outperform random forests on log loss while offering significant speedups over HDP. The approach preserves the interpretability and scalability of BNCs and provides a flexible framework for extending smoothing via linear-model theory, suggesting practical applicability for large categorical datasets and future extensions to larger cardinalities and continuous features.

Abstract

Bayesian network classifiers (BNCs) possess a number of properties desirable for a modern classifier: They are easily interpretable, highly scalable, and offer adaptable complexity. However, traditional methods for learning BNCs have historically underperformed when compared to leading classification methods such as random forests. Recent parameter smoothing techniques using hierarchical Dirichlet processes (HDPs) have enabled BNCs to achieve performance competitive with random forests on categorical data, but these techniques are relatively inflexible, and require a complicated, specialized sampling process. In this paper, we introduce a novel method for parameter estimation that uses a log-linear regression to approximate the behaviour of HDPs. As a linear model, our method is remarkably flexible and simple to interpret, and can leverage the vast literature on learning linear models. Our experiments show that our method can outperform HDP smoothing while being orders of magnitude faster, remaining competitive with random forests on categorical data.

Paper Structure

This paper contains 28 sections, 16 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: For a child node $X_c$ with binary-valued parents $X_1$ and $X_2$, a tree that branches first on the values of $X_1$ and then on the values of $X_2$, with the corresponding: (a) parameters for HDP, (b) coefficients for HLS, and (c) design matrix for HLS.
  • Figure 2: (a) Win-Draw-Loss records on kDB-3 for various regularization strategies vs ridge with $\tau=1$. (b) Scatter plot for Bayesian inverse-gamma (HLS-IG) vs ridge regression (HLS-NB) on kDB-3, under zero-one loss. Red squares indicate datasets with top-15 CV variance.
  • Figure 3: Scatter plots for HLS-IG on TAN vs random forests, under (a) zero-one loss and (b) log loss. Red squares indicate datasets with top-15 cross-validation variance.
  • Figure 4: Critical difference diagrams representing the mean rank of models under zero-one loss and log loss, across the 42 datasets that HDP was trained on. BNCs use TAN as a structure.
  • Figure 5: Scatter plots for HLS-NB with an intercept vs HLS-NB with no intercept, on kDB-3, under (a) zero-one loss and (b) log loss. Red squares indicate datasets with top-15 cross-validation variance.
  • ...and 4 more figures