Table of Contents
Fetching ...

Learning Hyperplane Tree: A Piecewise Linear and Fully Interpretable Decision-making Framework

Hongyi Li, Jun Xu, William Ward Armstrong

TL;DR

The paper addresses the need for accurate and transparent models on tabular data, especially with small samples. It introduces Learning Hyperplane Tree (LHT), a fully interpretable, piecewise-linear tree that partitions data with hyperplanes and uses a fuzzy, least-squares derived membership function at leaves. An LH forest extension (WOLF) aggregates multiple LHTs to boost accuracy, and extensive experiments on nine UCI datasets show LHT achieving state-of-the-art-like performance with fast inference. The work emphasizes interpretability through explicit feature weights in branching blocks and points to future work in applying the method to homogeneous data and responsible-AI contexts.

Abstract

This paper introduces a novel tree-based model, Learning Hyperplane Tree (LHT), which outperforms state-of-the-art (SOTA) tree models for classification tasks on several public datasets. The structure of LHT is simple and efficient: it partitions the data using several hyperplanes to progressively distinguish between target and non-target class samples. Although the separation is not perfect at each stage, LHT effectively improves the distinction through successive partitions. During testing, a sample is classified by evaluating the hyperplanes defined in the branching blocks and traversing down the tree until it reaches the corresponding leaf block. The class of the test sample is then determined using the piecewise linear membership function defined in the leaf blocks, which is derived through least-squares fitting and fuzzy logic. LHT is highly transparent and interpretable--at each branching block, the contribution of each feature to the classification can be clearly observed.

Learning Hyperplane Tree: A Piecewise Linear and Fully Interpretable Decision-making Framework

TL;DR

The paper addresses the need for accurate and transparent models on tabular data, especially with small samples. It introduces Learning Hyperplane Tree (LHT), a fully interpretable, piecewise-linear tree that partitions data with hyperplanes and uses a fuzzy, least-squares derived membership function at leaves. An LH forest extension (WOLF) aggregates multiple LHTs to boost accuracy, and extensive experiments on nine UCI datasets show LHT achieving state-of-the-art-like performance with fast inference. The work emphasizes interpretability through explicit feature weights in branching blocks and points to future work in applying the method to homogeneous data and responsible-AI contexts.

Abstract

This paper introduces a novel tree-based model, Learning Hyperplane Tree (LHT), which outperforms state-of-the-art (SOTA) tree models for classification tasks on several public datasets. The structure of LHT is simple and efficient: it partitions the data using several hyperplanes to progressively distinguish between target and non-target class samples. Although the separation is not perfect at each stage, LHT effectively improves the distinction through successive partitions. During testing, a sample is classified by evaluating the hyperplanes defined in the branching blocks and traversing down the tree until it reaches the corresponding leaf block. The class of the test sample is then determined using the piecewise linear membership function defined in the leaf blocks, which is derived through least-squares fitting and fuzzy logic. LHT is highly transparent and interpretable--at each branching block, the contribution of each feature to the classification can be clearly observed.
Paper Structure (13 sections, 32 equations, 6 figures, 3 tables, 2 algorithms)

This paper contains 13 sections, 32 equations, 6 figures, 3 tables, 2 algorithms.

Figures (6)

  • Figure 1: The structure of LHT is illustrated. LHT consists of two types of blocks: a branching block, which employs hyperplanes for sample partitioning, and a leaf block, where least-squares fitted membership functions are used for classifying test samples.
  • Figure 2: The case is illustrated when $N_3 = N_{\max}$ and $N_{max}\geq\gamma$, where $c = \min \text{TFS}$. Samples with a feature-weighted sum smaller than $c$ are assigned to the left subblock 1, and the remaining samples are assigned to the right subblock 2. Since all samples in left subblock 1 are non-target class samples, it is marked as a leaf block. Right subblock 2 still contains mixed samples, and the allocation process continues based on the data within the block until all samples are properly classified.
  • Figure 3: The LHT structures of the three classes in the wine dataset are shown, with the left side corresponding to the case where $\beta=0$, and the right side to the case where $\beta=0.25$.
  • Figure 4: The LHT feature weight visualization for class 0 of the wine dataset is shown, with the left side corresponding to the case of $\beta=0$ and the right side to the case of $\beta=0.25$.
  • Figure 5: Visualization of the feature weights for each branching block of the three LHTs corresponding to the three classes in the wine dataset ($\beta=0$).
  • ...and 1 more figures