Efficiently Learning Probabilistic Logical Models by Cheaply Ranking Mined Rules

Jonathan Feldstein; Dominic Phillips; Efthymia Tsamoura

Efficiently Learning Probabilistic Logical Models by Cheaply Ranking Mined Rules

Jonathan Feldstein, Dominic Phillips, Efthymia Tsamoura

TL;DR

This work tackles the critical bottleneck of scalable structure learning for probabilistic logical models by introducing SPECTRUM, a framework that combines linear-time pattern mining and rule evaluation with a novel utility measure that balances precision, recall, symmetry, priors, and complexity. The approach yields a quadratic-time optimisation over a carefully restricted rule space, enabling CPU-scale learning on datasets with millions of facts while achieving competitive or superior accuracy to neural baselines. Theoretical guarantees accompany the practical pipeline, including ε-uncertainty bounds for pattern-based utility estimates and completeness for patterns within the N-close neighborhood. Empirically, SPECTRUM scales to large benchmarks and retrieves hand-engineered rules on CAD and Yelp, with CPU runtimes orders of magnitude faster than state-of-the-art neural structure learners, highlighting its potential to broaden adoption of interpretable neurosymbolic reasoning in real-world domains.

Abstract

Probabilistic logical models are a core component of neurosymbolic AI and are important in their own right for tasks that require high explainability. Unlike neural networks, logical theories that underlie the model are often handcrafted using domain expertise, making their development costly and prone to errors. While there are algorithms that learn logical theories from data, they are generally prohibitively expensive, limiting their applicability in real-world settings. Here, we introduce precision and recall for logical rules and define their composition as rule utility - a cost-effective measure of the predictive power of logical theories. We also introduce SPECTRUM, a scalable framework for learning logical theories from relational data. Its scalability derives from a linear-time algorithm for mining recurrent subgraphs in the data graph along with a second algorithm that, using a utility measure that can be computed in linear time, efficiently ranks rules derived from these subgraphs. Finally, we prove theoretical guarantees on the utility of the learnt logical theory. As a result, we demonstrate across various tasks that SPECTRUM scales to larger datasets, often learning more accurate logical theories on CPUs in < 1% the runtime of SOTA neural network approaches on GPUs.

Efficiently Learning Probabilistic Logical Models by Cheaply Ranking Mined Rules

TL;DR

Abstract

Paper Structure (58 sections, 4 theorems, 36 equations, 5 figures, 5 tables, 2 algorithms)

This paper contains 58 sections, 4 theorems, 36 equations, 5 figures, 5 tables, 2 algorithms.

Introduction
Motivation.
Limitations of state-of-the-art.
Contributions.
Empirical results.
Restrictions.
Related Work
ILP.
Differentiable methods.
Differentiable methods.
Preliminaries
First-order logic (FOL).
Hypergraphs.
Patterns.
Rule Utility
...and 43 more sections

Key Result

Theorem 3

The pattern occurrence distribution $P_{\mathcal{D}}$ is the function that maps each connected pattern to the number of its groundings in $\mathcal{D}$. If $P_{\mathcal{D}}$ is Zipfian, then to ensure that $\mathsf{U}(\bm{\rho})$ is $\varepsilon$-uncertain, the upper bound on $N$ in Algorithm alg:fr

Figures (5)

Figure 1: Datagraph $\overline{\mathcal{G}}_{\mathcal{D}}$ for $\mathcal{D}$ in Example \ref{['example:fol']}.
Figure 2: The patterns of $\rho_1$ and $\mathsf{body}(\rho_1)$.
Figure 3: SPECTRUM's linear scaling in dataset size.
Figure 4: Datagraph $\overline{\mathcal{G}}_{\mathcal{D}}$ for a dataset $\mathcal{D} = \{$$\textsc{likes}(\mathtt{alice}, \mathtt{star wars})$, $\textsc{friends}(\mathtt{alice}, \mathtt{bob})$, $\textsc{likes}(\mathtt{bob}, \mathtt{star wars})\}$. Constants $\mathtt{alice}$, $\mathtt{bob}$, and $\mathtt{star wars}$ are abbreviated as $\mathtt{a}$, $\mathtt{b}$, and $\mathtt{s}$. For this data, rule $\rho_1$ has a single grounding. However, the number of isomorphisms of $\mathcal{G}_{\mathsf{body}(\rho_1)}$ into $\mathcal{G}_{\mathsf{body}(\rho_1) \wedge \mathsf{head}(\rho_1)}$ is $2$, hence $\mathsf{S}(\rho_1) = 2$.
Figure 5: Graph $\overline{\mathcal{G}}_{\mathcal{D}}$ from Example \ref{['example:patternmining']}. The graph contains three types of labelled edges: red unary edges $\textsc{p}_1$, dashed black binary edges $\textsc{p}_2$, and solid black binary edges $\textsc{p}_3$.

Theorems & Definitions (22)

Example 1
Definition 1
Definition 2
Example 2
Definition 3
Definition 4
Definition 5
Definition 6
Definition 7
Definition 8
...and 12 more

Efficiently Learning Probabilistic Logical Models by Cheaply Ranking Mined Rules

TL;DR

Abstract

Efficiently Learning Probabilistic Logical Models by Cheaply Ranking Mined Rules

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (22)