Tabular Foundation Models Can Learn Association Rules
Erkan Karabulut, Daniel Daza, Paul Groth, Martijn C. Schut, Victoria Degeler
TL;DR
This work addresses ARM for tabular data by leveraging Tabular Foundation Models (TFMs) to learn association rules without frequent itemset mining. It introduces a model-agnostic framework for extracting rules from conditional probabilistic models and instantiates it as TabProbe, using TFMs such as TabPFN, TabICL, and TabDPT. Rules are derived via antecedent validation and consequent extraction governed by thresholds $\tau_a$ and $\tau_c$, yielding concise, high-quality rule sets with strong predictive performance, particularly in small data scenarios. The study discusses scalability and limitations (context size, single-target predictions) and outlines directions for extending to additional models, data modalities, and interpretable downstream classifiers.
Abstract
Association Rule Mining (ARM) is a fundamental task for knowledge discovery in tabular data and is widely used in high-stakes decision-making. Classical ARM methods rely on frequent itemset mining, leading to rule explosion and poor scalability, while recent neural approaches mitigate these issues but suffer from degraded performance in low-data regimes. Tabular foundation models (TFMs), pretrained on diverse tabular data with strong in-context generalization, provide a basis for addressing these limitations. We introduce a model-agnostic association rule learning framework that extracts association rules from any conditional probabilistic model over tabular data, enabling us to leverage TFMs. We then introduce TabProbe, an instantiation of our framework that utilizes TFMs as conditional probability estimators to learn association rules out-of-the-box without frequent itemset mining. We evaluate our approach on tabular datasets of varying sizes based on standard ARM rule quality metrics and downstream classification performance. The results show that TFMs consistently produce concise, high-quality association rules with strong predictive performance and remain robust in low-data settings without task-specific training. Source code is available at https://github.com/DiTEC-project/tabprobe.
