Stable Structure Learning with HC-Stable and Tabu-Stable Algorithms
Neville K. Kitson, Anthony C. Constantinou
TL;DR
The paper tackles instability in Bayesian Network structure learning caused by dataset artifacts such as variable order, extending stability concerns to score-based and hybrid methods. It proposes HC-Stable and Tabu-Stable, which compute a stable node order to remove arbitrary edge orientations during hill-climbing, yielding invariant results and improved objective scores. Empirical results show Tabu-Stable often achieves the highest $S_{BIC}$ and best categorical accuracy, with zero BIC/structural instability across many networks, while residual instability in categorical networks can arise from duplicate variable sequences. The work provides a scalable, practical solution with public code and promotes stability as a core criterion in algorithm design and benchmarking.
Abstract
Many Bayesian Network structure learning algorithms are unstable, with the learned graph sensitive to arbitrary dataset artifacts, such as the ordering of columns (i.e., variable order). PC-Stable attempts to address this issue for the widely-used PC algorithm, prompting researchers to use the "stable" version instead. However, this problem seems to have been overlooked for score-based algorithms. In this study, we show that some widely-used score-based algorithms, as well as hybrid and constraint-based algorithms, including PC-Stable, suffer from the same issue. We propose a novel solution for score-based greedy hill-climbing that eliminates instability by determining a stable node order, leading to consistent results regardless of variable ordering. Two implementations, HC-Stable and Tabu-Stable, are introduced. Tabu-Stable achieves the highest BIC scores across all networks, and the highest accuracy for categorical networks. These results highlight the importance of addressing instability in structure learning and provide a robust and practical approach for future applications. This extends the scope and impact of our previous work presented at Probabilistic Graphical Models 2024 by incorporating continuous variables. The implementation, along with usage instructions, is freely available on GitHub at https://github.com/causal-iq/discovery.
