Table of Contents
Fetching ...

Permutation Decision Trees

Harikrishnan N B, Arham Jain, Nithin Nagaraj

TL;DR

Standard impurity measures like Shannon entropy and Gini impurity ignore order, limiting modeling of temporal dependencies. The authors introduce Effort-To-Compress ($ETC$) as a structural impurity and develop Permutation Decision Trees (PDT) and Permutation Decision Forests (PDF) to produce permutation-sensitive models. PDT yields trees that reflect order-dependent patterns and performs comparably to classical decision trees across real datasets, while PDF matches Random Forest performance with far fewer estimators (21 vs 50–1000). The work offers improved interpretability and efficiency for temporally ordered data and points to future work on theoretical links to Kolmogorov complexity and non-iid data.

Abstract

Decision Tree is a well understood Machine Learning model that is based on minimizing impurities in the internal nodes. The most common impurity measures are Shannon entropy and Gini impurity. These impurity measures are insensitive to the order of training data and hence the final tree obtained is invariant to any permutation of the data. This is a limitation in terms of modeling when there are temporal order dependencies between data instances. In this research, we propose the adoption of Effort-To-Compress (ETC) - a complexity measure, for the first time, as an alternative impurity measure. Unlike Shannon entropy and Gini impurity, structural impurity based on ETC is able to capture order dependencies in the data, thus obtaining potentially different decision trees for different permutations of the same data instances, a concept we term as Permutation Decision Trees (PDT). We then introduce the notion of Permutation Bagging achieved using permutation decision trees without the need for random feature selection and sub-sampling. We conduct a performance comparison between Permutation Decision Trees and classical decision trees across various real-world datasets, including Appendicitis, Breast Cancer Wisconsin, Diabetes Pima Indian, Ionosphere, Iris, Sonar, and Wine. Our findings reveal that PDT demonstrates comparable performance to classical decision trees across most datasets. Remarkably, in certain instances, PDT even slightly surpasses the performance of classical decision trees. In comparing Permutation Bagging with Random Forest, we attain comparable performance to Random Forest models consisting of 50 to 1000 trees, using merely 21 trees. This highlights the efficiency and effectiveness of Permutation Bagging in achieving comparable performance outcomes with significantly fewer trees.

Permutation Decision Trees

TL;DR

Standard impurity measures like Shannon entropy and Gini impurity ignore order, limiting modeling of temporal dependencies. The authors introduce Effort-To-Compress () as a structural impurity and develop Permutation Decision Trees (PDT) and Permutation Decision Forests (PDF) to produce permutation-sensitive models. PDT yields trees that reflect order-dependent patterns and performs comparably to classical decision trees across real datasets, while PDF matches Random Forest performance with far fewer estimators (21 vs 50–1000). The work offers improved interpretability and efficiency for temporally ordered data and points to future work on theoretical links to Kolmogorov complexity and non-iid data.

Abstract

Decision Tree is a well understood Machine Learning model that is based on minimizing impurities in the internal nodes. The most common impurity measures are Shannon entropy and Gini impurity. These impurity measures are insensitive to the order of training data and hence the final tree obtained is invariant to any permutation of the data. This is a limitation in terms of modeling when there are temporal order dependencies between data instances. In this research, we propose the adoption of Effort-To-Compress (ETC) - a complexity measure, for the first time, as an alternative impurity measure. Unlike Shannon entropy and Gini impurity, structural impurity based on ETC is able to capture order dependencies in the data, thus obtaining potentially different decision trees for different permutations of the same data instances, a concept we term as Permutation Decision Trees (PDT). We then introduce the notion of Permutation Bagging achieved using permutation decision trees without the need for random feature selection and sub-sampling. We conduct a performance comparison between Permutation Decision Trees and classical decision trees across various real-world datasets, including Appendicitis, Breast Cancer Wisconsin, Diabetes Pima Indian, Ionosphere, Iris, Sonar, and Wine. Our findings reveal that PDT demonstrates comparable performance to classical decision trees across most datasets. Remarkably, in certain instances, PDT even slightly surpasses the performance of classical decision trees. In comparing Permutation Bagging with Random Forest, we attain comparable performance to Random Forest models consisting of 50 to 1000 trees, using merely 21 trees. This highlights the efficiency and effectiveness of Permutation Bagging in achieving comparable performance outcomes with significantly fewer trees.
Paper Structure (18 sections, 1 equation, 6 figures, 11 tables, 3 algorithms)

This paper contains 18 sections, 1 equation, 6 figures, 11 tables, 3 algorithms.

Figures (6)

  • Figure 1: An interpretable decision tree designed to assess liver functionality and recommend appropriate remedies in the presence of detected abnormalities.
  • Figure 2: (a) Relationship between Normalized ETC mean and Shannon Entropy with respect to permutations of the binary string (of length $n=20$) with number of zeros $k$ varying from $0$ to $20$. (b) Relationship between Normalized ETC mean with respect to Shannon Entropy for permutations of the binary string (of length $20$) with number of zeros $k$ varying from $1$ to $19$. In the plot, we excluded the case where the binary string is completely a zero entry or a ones entry array.
  • Figure 3: Decision Tree structure with a parent node and two child node (Left Child and Right Child).
  • Figure 4: (a) Left: A visual representation of the toy example provided in Table \ref{['table_toy_example']}. (b) Right: Decision tree using the proposed structural impurity (computed using ETC) for Permutation A.
  • Figure 7: Permutation Decision Forest. The input dataset is subjected to a large number of permutations resulting in different ordering of the data instances. Each such permutation is then used to create a specific permutation decision tree using the structural impurity measure (computed using ETC) as the splitting criteria (ETC gain). The results from each permutation decision tree are then fed into a majority voting scheme to determine the final predicted label.
  • ...and 1 more figures