Table of Contents
Fetching ...

Stable Structure Learning with HC-Stable and Tabu-Stable Algorithms

Neville K. Kitson, Anthony C. Constantinou

TL;DR

The paper tackles instability in Bayesian Network structure learning caused by dataset artifacts such as variable order, extending stability concerns to score-based and hybrid methods. It proposes HC-Stable and Tabu-Stable, which compute a stable node order to remove arbitrary edge orientations during hill-climbing, yielding invariant results and improved objective scores. Empirical results show Tabu-Stable often achieves the highest $S_{BIC}$ and best categorical accuracy, with zero BIC/structural instability across many networks, while residual instability in categorical networks can arise from duplicate variable sequences. The work provides a scalable, practical solution with public code and promotes stability as a core criterion in algorithm design and benchmarking.

Abstract

Many Bayesian Network structure learning algorithms are unstable, with the learned graph sensitive to arbitrary dataset artifacts, such as the ordering of columns (i.e., variable order). PC-Stable attempts to address this issue for the widely-used PC algorithm, prompting researchers to use the "stable" version instead. However, this problem seems to have been overlooked for score-based algorithms. In this study, we show that some widely-used score-based algorithms, as well as hybrid and constraint-based algorithms, including PC-Stable, suffer from the same issue. We propose a novel solution for score-based greedy hill-climbing that eliminates instability by determining a stable node order, leading to consistent results regardless of variable ordering. Two implementations, HC-Stable and Tabu-Stable, are introduced. Tabu-Stable achieves the highest BIC scores across all networks, and the highest accuracy for categorical networks. These results highlight the importance of addressing instability in structure learning and provide a robust and practical approach for future applications. This extends the scope and impact of our previous work presented at Probabilistic Graphical Models 2024 by incorporating continuous variables. The implementation, along with usage instructions, is freely available on GitHub at https://github.com/causal-iq/discovery.

Stable Structure Learning with HC-Stable and Tabu-Stable Algorithms

TL;DR

The paper tackles instability in Bayesian Network structure learning caused by dataset artifacts such as variable order, extending stability concerns to score-based and hybrid methods. It proposes HC-Stable and Tabu-Stable, which compute a stable node order to remove arbitrary edge orientations during hill-climbing, yielding invariant results and improved objective scores. Empirical results show Tabu-Stable often achieves the highest and best categorical accuracy, with zero BIC/structural instability across many networks, while residual instability in categorical networks can arise from duplicate variable sequences. The work provides a scalable, practical solution with public code and promotes stability as a core criterion in algorithm design and benchmarking.

Abstract

Many Bayesian Network structure learning algorithms are unstable, with the learned graph sensitive to arbitrary dataset artifacts, such as the ordering of columns (i.e., variable order). PC-Stable attempts to address this issue for the widely-used PC algorithm, prompting researchers to use the "stable" version instead. However, this problem seems to have been overlooked for score-based algorithms. In this study, we show that some widely-used score-based algorithms, as well as hybrid and constraint-based algorithms, including PC-Stable, suffer from the same issue. We propose a novel solution for score-based greedy hill-climbing that eliminates instability by determining a stable node order, leading to consistent results regardless of variable ordering. Two implementations, HC-Stable and Tabu-Stable, are introduced. Tabu-Stable achieves the highest BIC scores across all networks, and the highest accuracy for categorical networks. These results highlight the importance of addressing instability in structure learning and provide a robust and practical approach for future applications. This extends the scope and impact of our previous work presented at Probabilistic Graphical Models 2024 by incorporating continuous variables. The implementation, along with usage instructions, is freely available on GitHub at https://github.com/causal-iq/discovery.

Paper Structure

This paper contains 16 sections, 7 equations, 4 figures, 11 tables, 2 algorithms.

Figures (4)

  • Figure 1: The sequence of DAG changes when HC learns the categorical Asia network from 10,000 samples. The numbers beside each arc show the iteration at which it is added. Arc colours compare the learned arc against the true arc, and whether it is solid or dashed indicates whether its orientation was arbitrary. Variable order within the dataset is alphabetic.
  • Figure 2: A comparison of the F1 CPDAG of the categorical variable graphs learned by Tabu using different node orderings: a) variable order, b) simple increasing or c) decreasing score order, and d) Tabu-Stable. 25 experiments with randomised variable names, variable order and row order are conducted for each of the four sample sizes for each network. Shading around lines indicates the SD of F1 values. Note that lines are drawn on the chart in the order shown in the key, so a particular line may be hidden where values are coincident.
  • Figure 3: Mean values of structural, inference and stability metrics across categorical networks and sample sizes for different algorithms.
  • Figure 4: Mean values of structural, inference and stability metrics across continuous networks and sample sizes for different algorithms.