TT-Sparse: Learning Sparse Rule Models with Differentiable Truth Tables

Hans Farrell Soegeng; Sarthak Ketanbhai Modi; Thomas Peyrin

TT-Sparse: Learning Sparse Rule Models with Differentiable Truth Tables

Hans Farrell Soegeng, Sarthak Ketanbhai Modi, Thomas Peyrin

TL;DR

TT-Sparse is introduced, a flexible neural building block that leverages differentiable truth tables as nodes to learn sparse, effective connections and can be transformed exactly into compact, globally interpretable DNF/CNF Boolean formulas via Quine-McCluskey minimization.

Abstract

Interpretable machine learning is essential in high-stakes domains where decision-making requires accountability, transparency, and trust. While rule-based models offer global and exact interpretability, learning rule sets that simultaneously achieve high predictive performance and low, human-understandable complexity remains challenging. To address this, we introduce TT-Sparse, a flexible neural building block that leverages differentiable truth tables as nodes to learn sparse, effective connections. A key contribution of our approach is a new soft TopK operator with straight-through estimation for learning discrete, cardinality-constrained feature selection in an end-to-end differentiable manner. Crucially, the forward pass remains sparse, enabling efficient computation and exact symbolic rule extraction. As a result, each node (and the entire model) can be transformed exactly into compact, globally interpretable DNF/CNF Boolean formulas via Quine-McCluskey minimization. Extensive empirical results across 28 datasets spanning binary, multiclass, and regression tasks show that the learned sparse rules exhibit superior predictive performance with lower complexity compared to existing state-of-the-art methods.

TT-Sparse: Learning Sparse Rule Models with Differentiable Truth Tables

TL;DR

Abstract

Paper Structure (28 sections, 23 equations, 16 figures, 4 tables, 1 algorithm)

This paper contains 28 sections, 23 equations, 16 figures, 4 tables, 1 algorithm.

Introduction
Related Work
TT-Sparse
Components
TT-Sparse Design
Rule Extraction
Experiments
Ablation Study
Conclusion
Limitations
Future Work
Impact Statement
Datasets
Implementation Details
Hardware
...and 13 more sections

Figures (16)

Figure 1: AUC score against rule log-complexity scatter plot of 8 interpretable models and TabM as SOTA black-box baseline across different hyperparameters, on diabetes and heart tabular datasets.
Figure 2: A TT-Sparse model trained on Heart dataset converted to Boolean decision trees, achieving 91% test ROC-AUC score and complexity of 15. The sigmoid $\sigma(\cdot)$ function is applied to the final activated weights + intercept term to obtain the probability of heart disease existence between $[0,1]$.
Figure 3: Overview of the TT-Sparse Architecture.(Left) The hybrid model structure. The input vector is processed by a layer of Learnable Truth Table (LTT) nodes (blue trapezoids), which extract higher-order Boolean rules. These outputs are concatenated with the raw input features to form the final prediction. (Middle) Each potential connection is parameterized by the logic weight $W_{\text{LTT}}$ and the mapping weight $W_{\text{map}}$. Darker lines highlight the "active" connections, those where $W_{\text{map}}$ belongs to the subset of the $k$-highest mapping weights for that node. Each LTT node is convertible to an equivalent CNF/DNF equation (See Figure \ref{['fig:ltt-conversion']}). (Right) In the forward pass, the input is fed forward and multiplied by $W_{\text{LTT}}$ if the corresponding $W_{\text{map}}$ is part of TopK. In the backward pass, gradients flow with the Soft TopK relaxation, updating both $W_{\text{map}}$ and $W_{\text{LTT}}$. This design enables exact rule extraction while preserving gradient-based training through discrete connection selection.
Figure 4: Conversion of an LTT node to DNF by truth table enumeration of $2^n$ input combinations and obtaining the binary outputs with the LTT weight and bias parameters.
Figure 5: Predictive performance comparison (AUC/$R^2$, higher is better) between TT-Sparse (Soft TopK) and the slot-based Softmax baseline. Points below the diagonal indicate superior performance by TT-Sparse.
...and 11 more figures

TT-Sparse: Learning Sparse Rule Models with Differentiable Truth Tables

TL;DR

Abstract

TT-Sparse: Learning Sparse Rule Models with Differentiable Truth Tables

Authors

TL;DR

Abstract

Table of Contents

Figures (16)