MoRE-LLM: Mixture of Rule Experts Guided by a Large Language Model
Alexander Koebler, Ingo Thon, Florian Buettner
TL;DR
MoRE-LLM addresses the challenge of trustworthy, interpretable predictions by coupling a small task-specific model with a learned rule set through a gating mechanism, and by refining these rules with an LLM during training. The approach uses Anchors to generate local rule surrogates and a constrained optimization framework based on Dynamic Barrier Gradient Descent to keep predictive performance close to an unconstrained model while increasing rule usage. A two-phase LLM process—adaptation during rule refinement and pruning for alignment—ensures domain knowledge is embedded into the rule set without requiring LLM access at deployment. Experiments on tabular datasets show MoRE-LLM can achieve competitive accuracy with significantly more domain-aligned, high-fidelity explanations than purely white-box methods and offer interpretability comparable to non-interpretable baselines.
Abstract
To ensure the trustworthiness and interpretability of AI systems, it is essential to align machine learning models with human domain knowledge. This can be a challenging and time-consuming endeavor that requires close communication between data scientists and domain experts. Recent leaps in the capabilities of Large Language Models (LLMs) can help alleviate this burden. In this paper, we propose a Mixture of Rule Experts guided by a Large Language Model (MoRE-LLM) which combines a data-driven black-box model with knowledge extracted from an LLM to enable domain knowledge-aligned and transparent predictions. While the introduced Mixture of Rule Experts (MoRE) steers the discovery of local rule-based surrogates during training and their utilization for the classification task, the LLM is responsible for enhancing the domain knowledge alignment of the rules by correcting and contextualizing them. Importantly, our method does not rely on access to the LLM during test time and ensures interpretability while not being prone to LLM-based confabulations. We evaluate our method on several tabular data sets and compare its performance with interpretable and non-interpretable baselines. Besides performance, we evaluate our grey-box method with respect to the utilization of interpretable rules. In addition to our quantitative evaluation, we shed light on how the LLM can provide additional context to strengthen the comprehensibility and trustworthiness of the model's reasoning process.
