Table of Contents
Fetching ...

Neuro-Symbolic Rule Lists

Sascha Xu, Nils Philipp Walter, Jilles Vreeken

TL;DR

NeuRules, an end-to-end trainable model that unifies discretization, rule learning, and rule order into a single differentiable framework, is introduced, which consistently outperforms both combinatorial and neuro-symbolic methods.

Abstract

Machine learning models deployed in sensitive areas such as healthcare must be interpretable to ensure accountability and fairness. Rule lists (if Age < 35 $\wedge$ Priors > 0 then Recidivism = True, else if Next Condition . . . ) offer full transparency, making them well-suited for high-stakes decisions. However, learning such rule lists presents significant challenges. Existing methods based on combinatorial optimization require feature pre-discretization and impose restrictions on rule size. Neuro-symbolic methods use more scalable continuous optimization yet place similar pre-discretization constraints and suffer from unstable optimization. To address the existing limitations, we introduce NeuRules, an end-to-end trainable model that unifies discretization, rule learning, and rule order into a single differentiable framework. We formulate a continuous relaxation of the rule list learning problem that converges to a strict rule list through temperature annealing. NeuRules learns both the discretizations of individual features, as well as their combination into conjunctive rules without any pre-processing or restrictions. Extensive experiments demonstrate that NeuRules consistently outperforms both combinatorial and neuro-symbolic methods, effectively learning simple and complex rules, as well as their order, across a wide range of datasets.

Neuro-Symbolic Rule Lists

TL;DR

NeuRules, an end-to-end trainable model that unifies discretization, rule learning, and rule order into a single differentiable framework, is introduced, which consistently outperforms both combinatorial and neuro-symbolic methods.

Abstract

Machine learning models deployed in sensitive areas such as healthcare must be interpretable to ensure accountability and fairness. Rule lists (if Age < 35 Priors > 0 then Recidivism = True, else if Next Condition . . . ) offer full transparency, making them well-suited for high-stakes decisions. However, learning such rule lists presents significant challenges. Existing methods based on combinatorial optimization require feature pre-discretization and impose restrictions on rule size. Neuro-symbolic methods use more scalable continuous optimization yet place similar pre-discretization constraints and suffer from unstable optimization. To address the existing limitations, we introduce NeuRules, an end-to-end trainable model that unifies discretization, rule learning, and rule order into a single differentiable framework. We formulate a continuous relaxation of the rule list learning problem that converges to a strict rule list through temperature annealing. NeuRules learns both the discretizations of individual features, as well as their combination into conjunctive rules without any pre-processing or restrictions. Extensive experiments demonstrate that NeuRules consistently outperforms both combinatorial and neuro-symbolic methods, effectively learning simple and complex rules, as well as their order, across a wide range of datasets.

Paper Structure

This paper contains 41 sections, 40 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Rule list learned with $\textsc{NeuRules}$ on the Heart Disease dataset. $\textsc{NeuRules}$ jointly optimizes thresholds, rule aggregation and ordering into a rule list.
  • Figure 2: $\textsc{NeuRules}$ architecture. The input $\textbf{x} \in {\mathbb{R}}^{d}$ is discretized into soft predicates $\hat{\pi}$ using learnable threshold $\mathbf{\alpha}_j, \mathbf{\beta}_j \in {\mathbb{R}}^{d}$ and then combined into $k$ rules $\hat{a}_j(\textbf{x})$. The rules are sorted by their priority $p_j$, where using the Gumbel-Softmax function, we approximate the indicator function $I_j$ of the active rule with highest priority $\hat{I}_j(\textbf{x})$. The final prediction is computed by taking the weighted sum of the consequents $c_j$ and indicator $I_j$.
  • Figure 3: The soft predicate with different temperatures (a) approaches the true thresholding with decreasing temperature (b). Multiple soft predicates are combined into a conjunctive rule (c).
  • Figure 4: Weight of highest priority rule $\hat{I}_{\max}(\textbf{x})$ during training with decreasing temperature $t_{rl}$. The grey corresponds to the variance.
  • Figure 5: $\textsc{NeuRules}$ is accurate for both short and long rule lists (a). The lengths of the learned rules follow a power law (b), and consist of mostly succinct and some detailed rules. Using the relaxed conjunction $\hat{a}(\textbf{x})\leq \epsilon$ is always better (blue area) and improves the $F_1$ score on average by 0.3(c).
  • ...and 1 more figures