Neural Probabilistic Circuits: Enabling Compositional and Interpretable Predictions through Logical Reasoning
Weixin Chen, Simon Yu, Huajie Shao, Lui Sha, Han Zhao
TL;DR
This work tackles the interpretability gap in end-to-end deep networks by introducing Neural Probabilistic Circuits (NPCs), which couple a neural attribute recognizer with a symbolic, tractable probabilistic circuit to perform predictions via logical reasoning over attributes. They propose a three-stage training pipeline—attribute recognition, circuit construction (data-driven or knowledge-injected), and joint optimization—and prove a compositional error bound tying overall performance to module errors. The framework also offers interpretable explanations through Most Probable Explanations and Counterfactual Explanations, and demonstrates competitive accuracy on four benchmarks while enhancing transparency. The results suggest NPCs can achieve strong task performance with explicit, human-understandable reasoning, and outline limitations and future directions to further close interpretability-performance gaps in complex domains.
Abstract
End-to-end deep neural networks have achieved remarkable success across various domains but are often criticized for their lack of interpretability. While post hoc explanation methods attempt to address this issue, they often fail to accurately represent these black-box models, resulting in misleading or incomplete explanations. To overcome these challenges, we propose an inherently transparent model architecture called Neural Probabilistic Circuits (NPCs), which enable compositional and interpretable predictions through logical reasoning. In particular, an NPC consists of two modules: an attribute recognition model, which predicts probabilities for various attributes, and a task predictor built on a probabilistic circuit, which enables logical reasoning over recognized attributes to make class predictions. To train NPCs, we introduce a three-stage training algorithm comprising attribute recognition, circuit construction, and joint optimization. Moreover, we theoretically demonstrate that an NPC's error is upper-bounded by a linear combination of the errors from its modules. To further demonstrate the interpretability of NPC, we provide both the most probable explanations and the counterfactual explanations. Empirical results on four benchmark datasets show that NPCs strike a balance between interpretability and performance, achieving results competitive even with those of end-to-end black-box models while providing enhanced interpretability.
