Table of Contents
Fetching ...

Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data

George Yakushev, Alina Shutova, Ivan Rubachev, Natalia Bereberdina, Renat Sergazinov, Artem Babenko

TL;DR

This work designs a minimal set of tools for constructing, analyzing, and manipulating decision trees, and shows that a single decision tree constructed via the agentic loop can be competitive with state-of-the-art black-box models on tabular benchmarks, while also providing a human-readable reasoning trace that can be checked for biases and data leaks.

Abstract

Tabular foundation models are becoming increasingly popular for low-resource tabular problems. These models compensate for small training datasets by pretraining on large volumes of data. The prior knowledge obtained via pretraining provides exceptional performance, but the resulting model becomes a black box that is difficult to interpret and costly to run inference on. In this work, we explore an alternative strategy that is both more lightweight and controllable: using reasoning-capable LLMs to induce decision trees for small tabular datasets in an agentic setup. We design a minimal set of tools for constructing, analyzing, and manipulating decision trees. Using these tools, an LLM agent combines its prior knowledge with the user-specified constraints and learning from data to create lightweight decision trees. We show that a single decision tree constructed via the agentic loop can be competitive with state-of-the-art black-box models on tabular benchmarks, while also providing a human-readable reasoning trace that can be checked for biases and data leaks. Additionally, we show the model can incorporate fairness and monotonicity constraints.

Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data

TL;DR

This work designs a minimal set of tools for constructing, analyzing, and manipulating decision trees, and shows that a single decision tree constructed via the agentic loop can be competitive with state-of-the-art black-box models on tabular benchmarks, while also providing a human-readable reasoning trace that can be checked for biases and data leaks.

Abstract

Tabular foundation models are becoming increasingly popular for low-resource tabular problems. These models compensate for small training datasets by pretraining on large volumes of data. The prior knowledge obtained via pretraining provides exceptional performance, but the resulting model becomes a black box that is difficult to interpret and costly to run inference on. In this work, we explore an alternative strategy that is both more lightweight and controllable: using reasoning-capable LLMs to induce decision trees for small tabular datasets in an agentic setup. We design a minimal set of tools for constructing, analyzing, and manipulating decision trees. Using these tools, an LLM agent combines its prior knowledge with the user-specified constraints and learning from data to create lightweight decision trees. We show that a single decision tree constructed via the agentic loop can be competitive with state-of-the-art black-box models on tabular benchmarks, while also providing a human-readable reasoning trace that can be checked for biases and data leaks. Additionally, we show the model can incorporate fairness and monotonicity constraints.

Paper Structure

This paper contains 25 sections, 6 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Informal overview. We prompt an LLM agent to construct a decision tree in a thought--action--observation cycle (left). During the action phase, the agent uses a tree-editing framework (right) with tools for analyzing and modifying trees.
  • Figure 2: Tool call distribution across LLM backbones and datasets categorized by functionality.
  • Figure 3: a) Fairness evaluation on the Adult dataset across three setups: LLM-built trees with and without the fairness prompt and the sklearn baseline. b) Training with experiment on the Diabetes dataset: performance of trees trained with and without access to the Glucose feature. Both experiments use GPT-5 backbone the setup from Section \ref{['sec:perf-eval']}.
  • Figure 4: Word cloud of function calls by category.
  • Figure 5: Visual analysis of monotonicity on the COMPAS dataset with a berief prompt instructions.