Table of Contents
Fetching ...

Heterogeneous Random Forest

Ye-eun Kim, Seoung Yun Kim, Hyunjoong Kim

TL;DR

Heterogeneous RF (HRF) is introduced, designed to enhance tree diversity in a meaningful way by deliberately introducing heterogeneity during the tree construction, which effectively mitigates the selection bias of trees within the ensemble, increases the diversity of the ensemble, and demonstrates superior performance on datasets with fewer noise features.

Abstract

Random forest (RF) stands out as a highly favored machine learning approach for classification problems. The effectiveness of RF hinges on two key factors: the accuracy of individual trees and the diversity among them. In this study, we introduce a novel approach called heterogeneous RF (HRF), designed to enhance tree diversity in a meaningful way. This diversification is achieved by deliberately introducing heterogeneity during the tree construction. Specifically, features used for splitting near the root node of previous trees are assigned lower weights when constructing the feature sub-space of the subsequent trees. As a result, dominant features in the prior trees are less likely to be employed in the next iteration, leading to a more diverse set of splitting features at the nodes. Through simulation studies, it was confirmed that the HRF method effectively mitigates the selection bias of trees within the ensemble, increases the diversity of the ensemble, and demonstrates superior performance on datasets with fewer noise features. To assess the comparative performance of HRF against other widely adopted ensemble methods, we conducted tests on 52 datasets, comprising both real-world and synthetic data. HRF consistently outperformed other ensemble methods in terms of accuracy across the majority of datasets.

Heterogeneous Random Forest

TL;DR

Heterogeneous RF (HRF) is introduced, designed to enhance tree diversity in a meaningful way by deliberately introducing heterogeneity during the tree construction, which effectively mitigates the selection bias of trees within the ensemble, increases the diversity of the ensemble, and demonstrates superior performance on datasets with fewer noise features.

Abstract

Random forest (RF) stands out as a highly favored machine learning approach for classification problems. The effectiveness of RF hinges on two key factors: the accuracy of individual trees and the diversity among them. In this study, we introduce a novel approach called heterogeneous RF (HRF), designed to enhance tree diversity in a meaningful way. This diversification is achieved by deliberately introducing heterogeneity during the tree construction. Specifically, features used for splitting near the root node of previous trees are assigned lower weights when constructing the feature sub-space of the subsequent trees. As a result, dominant features in the prior trees are less likely to be employed in the next iteration, leading to a more diverse set of splitting features at the nodes. Through simulation studies, it was confirmed that the HRF method effectively mitigates the selection bias of trees within the ensemble, increases the diversity of the ensemble, and demonstrates superior performance on datasets with fewer noise features. To assess the comparative performance of HRF against other widely adopted ensemble methods, we conducted tests on 52 datasets, comprising both real-world and synthetic data. HRF consistently outperformed other ensemble methods in terms of accuracy across the majority of datasets.

Paper Structure

This paper contains 15 sections, 6 equations, 6 figures, 6 tables, 3 algorithms.

Figures (6)

  • Figure 1: Decision tree examples. The feature inside the node is the split variable and the features in brackets next to the node represent candidate features
  • Figure 2: Trees for toy example
  • Figure 3: Box plot of feature depths
  • Figure 4: Dissimilarity of trees by ensemble methods.
  • Figure 5: Accuracy differences between HRF and bagging, and between HRF and RF, based on the proportion of noise features. A positive value indicates that the HRF model exhibits superior accuracy.
  • ...and 1 more figures