Table of Contents
Fetching ...

Ensembling LLM-Induced Decision Trees for Explainable and Robust Error Detection

Mengqi Wang, Jianwei Wang, Qing Liu, Xiwei Xu, Zhenchang Xing, Liming Zhu, Wenjie Zhang

TL;DR

The paper tackles error detection in tabular data by moving beyond black-box LLM labeling to an explainable LLM-driven inducer framework. It introduces TreeED, which prompts an LLM to generate executable decision trees with rule and GNN nodes, and ForestED, which ensembles multiple TreeEDs via an EM-based consensus to improve robustness. Empirical results on seven diverse datasets show state-of-the-art accuracy and substantially reduced variance across runs, with an average F1 gain of 16.1% over the best baseline and strong explainability signals. The approach offers a practical, scalable path to trustworthy data quality tooling by combining symbolic and neural reasoning with principled ensemble fusion.

Abstract

Error detection (ED), which aims to identify incorrect or inconsistent cell values in tabular data, is important for ensuring data quality. Recent state-of-the-art ED methods leverage the pre-trained knowledge and semantic capability embedded in large language models (LLMs) to directly label whether a cell is erroneous. However, this LLM-as-a-labeler pipeline (1) relies on the black box, implicit decision process, thus failing to provide explainability for the detection results, and (2) is highly sensitive to prompts, yielding inconsistent outputs due to inherent model stochasticity, therefore lacking robustness. To address these limitations, we propose an LLM-as-an-inducer framework that adopts LLM to induce the decision tree for ED (termed TreeED) and further ensembles multiple such trees for consensus detection (termed ForestED), thereby improving explainability and robustness. Specifically, based on prompts derived from data context, decision tree specifications and output requirements, TreeED queries the LLM to induce the decision tree skeleton, whose root-to-leaf decision paths specify the stepwise procedure for evaluating a given sample. Each tree contains three types of nodes: (1) rule nodes that perform simple validation checks (e.g., format or range), (2) Graph Neural Network (GNN) nodes that capture complex patterns (e.g., functional dependencies), and (3) leaf nodes that output the final decision types (error or clean). Furthermore, ForestED employs uncertainty-based sampling to obtain multiple row subsets, constructing a decision tree for each subset using TreeED. It then leverages an Expectation-Maximization-based algorithm that jointly estimates tree reliability and optimizes the consensus ED prediction. Extensive xperiments demonstrate that our methods are accurate, explainable and robust, achieving an average F1-score improvement of 16.1% over the best baseline.

Ensembling LLM-Induced Decision Trees for Explainable and Robust Error Detection

TL;DR

The paper tackles error detection in tabular data by moving beyond black-box LLM labeling to an explainable LLM-driven inducer framework. It introduces TreeED, which prompts an LLM to generate executable decision trees with rule and GNN nodes, and ForestED, which ensembles multiple TreeEDs via an EM-based consensus to improve robustness. Empirical results on seven diverse datasets show state-of-the-art accuracy and substantially reduced variance across runs, with an average F1 gain of 16.1% over the best baseline and strong explainability signals. The approach offers a practical, scalable path to trustworthy data quality tooling by combining symbolic and neural reasoning with principled ensemble fusion.

Abstract

Error detection (ED), which aims to identify incorrect or inconsistent cell values in tabular data, is important for ensuring data quality. Recent state-of-the-art ED methods leverage the pre-trained knowledge and semantic capability embedded in large language models (LLMs) to directly label whether a cell is erroneous. However, this LLM-as-a-labeler pipeline (1) relies on the black box, implicit decision process, thus failing to provide explainability for the detection results, and (2) is highly sensitive to prompts, yielding inconsistent outputs due to inherent model stochasticity, therefore lacking robustness. To address these limitations, we propose an LLM-as-an-inducer framework that adopts LLM to induce the decision tree for ED (termed TreeED) and further ensembles multiple such trees for consensus detection (termed ForestED), thereby improving explainability and robustness. Specifically, based on prompts derived from data context, decision tree specifications and output requirements, TreeED queries the LLM to induce the decision tree skeleton, whose root-to-leaf decision paths specify the stepwise procedure for evaluating a given sample. Each tree contains three types of nodes: (1) rule nodes that perform simple validation checks (e.g., format or range), (2) Graph Neural Network (GNN) nodes that capture complex patterns (e.g., functional dependencies), and (3) leaf nodes that output the final decision types (error or clean). Furthermore, ForestED employs uncertainty-based sampling to obtain multiple row subsets, constructing a decision tree for each subset using TreeED. It then leverages an Expectation-Maximization-based algorithm that jointly estimates tree reliability and optimizes the consensus ED prediction. Extensive xperiments demonstrate that our methods are accurate, explainable and robust, achieving an average F1-score improvement of 16.1% over the best baseline.

Paper Structure

This paper contains 20 sections, 1 theorem, 29 equations, 8 figures, 8 tables, 3 algorithms.

Key Result

Lemma 1

Let $o_{i,j}(y)$ be any variational distribution over the latent label $\mathbf{Y}_{i,j}$, and define the evidence lower bound (ELBO) as For an ensemble of $R$ decision trees producing predictions $\hat{\mathbf{Y}}^{(r)}_{i,j}$, maximizing $\mathcal{F}$ with respect to either the variational posteriors $o$ (E-step) or the reliability matrices $\Theta$ (M-step) yields a monotonic increase in the m

Figures (8)

  • Figure 1: An illustrated example of common data errors.
  • Figure 2: Framework comparisons of LLM-based ED methods.
  • Figure 3: Framework overview of TreeED and ForestED.
  • Figure 4: Runtime and token cost across datasets.
  • Figure 5: Effect of varying the number of labeled records on model performance across different datasets.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Example 1
  • Definition 1: Tabular Error Detection
  • Remark 1
  • Example 2
  • Lemma 1: Monotonicity of EM Updates