NEUROLOGIC: From Neural Representations to Interpretable Logic Rules
Chuqin Geng, Anqi Xing, Li Zhang, Ziyu Zhao, Yuhe Jiang, Xujie Si
TL;DR
This work tackles the interpretability gap in deep neural networks by introducing NeuroLogic, a framework that extracts global, interpretable logic rules directly from neural representations. It identifies class-specific hidden predicates via Neural Activation Patterns (NAPs) across layers, converts activations into predicates, constructs discriminative Disjunctive Normal Form (DNF) rules, and grounds these rules to human-interpretable input spaces using flexible grounding strategies and DSL-driven symbolic expressions. The approach demonstrates strong scalability to modern architectures, including Transformers, and yields compact, faithful rule sets that can be grounded in simple inputs or rich vocabulary spaces, with causal grounding supporting semantic interpretability. Empirically, NeuroLogic achieves competitive fidelity and accuracy while producing substantially shorter, more interpretable rule sets on small benchmarks, and shows effective grounding and rule construction for Transformer-based sentiment analysis, outperforming or matching baselines in interpretability metrics and offering practical insights into model decisions.
Abstract
Rule-based explanation methods offer rigorous and globally interpretable insights into neural network behavior. However, existing approaches are mostly limited to small fully connected networks and depend on costly layerwise rule extraction and substitution processes. These limitations hinder their generalization to more complex architectures such as Transformers. Moreover, existing methods produce shallow, decision-tree-like rules that fail to capture rich, high-level abstractions in complex domains like computer vision and natural language processing. To address these challenges, we propose NEUROLOGIC, a novel framework that extracts interpretable logical rules directly from deep neural networks. Unlike previous methods, NEUROLOGIC can construct logic rules over hidden predicates derived from neural representations at any chosen layer, in contrast to costly layerwise extraction and rewriting. This flexibility enables broader architectural compatibility and improved scalability. Furthermore, NEUROLOGIC supports richer logical constructs and can incorporate human prior knowledge to ground hidden predicates back to the input space, enhancing interpretability. We validate NEUROLOGIC on Transformer-based sentiment analysis, demonstrating its ability to extract meaningful, interpretable logic rules and provide deeper insights-tasks where existing methods struggle to scale.
