A Unified Framework for Human-Allied Learning of Probabilistic Circuits
Athresh Karanam, Saurabh Mathur, Sahil Sidheekh, Sriraam Natarajan
TL;DR
This work proposes a unified framework to infuse domain knowledge into learning probabilistic circuits (PCs) by encoding various knowledge forms as differentiable domain constraints. Learning becomes a constrained optimization problem where the PC log-likelihood $\mathcal{L}(\mathcal{M}, \mathcal{D})$ is maximized while ensuring constraint satisfaction via a penalty function $\zeta$, with iterative penalty weights $\lambda_t$ updated by a factor $\gamma$ (Algorithm 1). Knowledge is represented as equal or inequality constraints on marginal and conditional queries, encompassing generalization, context-specific independence, monotonic influence, class-imbalance, and privileged information, enabling six instantiations. The framework is validated on synthetic, benchmark, and real-world data using RatSPN and EinsumNet, showing faithful constraint integration and improved generalization over purely data-driven learning, with demonstrated robustness to constraint noise and hyperparameters. Overall, the approach offers a simple, effective pathway for domain-expert guided PC learning in data-scarce, knowledge-rich settings, with broad applicability and potential extensions to relational data and PC structure learning.
Abstract
Probabilistic Circuits (PCs) have emerged as an efficient framework for representing and learning complex probability distributions. Nevertheless, the existing body of research on PCs predominantly concentrates on data-driven parameter learning, often neglecting the potential of knowledge-intensive learning, a particular issue in data-scarce/knowledge-rich domains such as healthcare. To bridge this gap, we propose a novel unified framework that can systematically integrate diverse domain knowledge into the parameter learning process of PCs. Experiments on several benchmarks as well as real world datasets show that our proposed framework can both effectively and efficiently leverage domain knowledge to achieve superior performance compared to purely data-driven learning approaches.
