Efficient Parameter Estimation for Bayesian Network Classifiers using Hierarchical Linear Smoothing
Connor Cooper, Geoffrey I. Webb, Daniel F. Schmidt
TL;DR
This work addresses the limited performance and rigidity of traditional Bayesian network classifiers by introducing Hierarchical Linear Smoothing (HLS), a log-linear model that approximates HDP-based smoothing of conditional probability tables. HLS constructs a design matrix from the CPT tree and learns CPT parameters jointly via multinomial logistic regression, using either standard ridge or Bayesian global-local shrinkage regularization, with efficient Pólya-Gamma augmentation for Bayesian variants. Empirically, HLS achieves competitive or superior performance compared with HDP smoothing and additive smoothing, and can outperform random forests on log loss while offering significant speedups over HDP. The approach preserves the interpretability and scalability of BNCs and provides a flexible framework for extending smoothing via linear-model theory, suggesting practical applicability for large categorical datasets and future extensions to larger cardinalities and continuous features.
Abstract
Bayesian network classifiers (BNCs) possess a number of properties desirable for a modern classifier: They are easily interpretable, highly scalable, and offer adaptable complexity. However, traditional methods for learning BNCs have historically underperformed when compared to leading classification methods such as random forests. Recent parameter smoothing techniques using hierarchical Dirichlet processes (HDPs) have enabled BNCs to achieve performance competitive with random forests on categorical data, but these techniques are relatively inflexible, and require a complicated, specialized sampling process. In this paper, we introduce a novel method for parameter estimation that uses a log-linear regression to approximate the behaviour of HDPs. As a linear model, our method is remarkably flexible and simple to interpret, and can leverage the vast literature on learning linear models. Our experiments show that our method can outperform HDP smoothing while being orders of magnitude faster, remaining competitive with random forests on categorical data.
