Table of Contents
Fetching ...

A novel gradient-based method for decision trees optimizing arbitrary differential loss functions

Andrei V. Konstantinov, Lev V. Utkin

TL;DR

This work addresses the rigidity of traditional decision trees that rely on heuristic splits by introducing a gradient-based method that directly optimizes arbitrary differentiable loss functions. The approach refines leaf predictions using the first and second derivatives of the loss, enabling accurate handling of classification, regression, and survival analysis with censored data, and it supports integration with neural networks. The authors formalize the problem, derive closed-form leaf updates, and implement a linear-time split evaluation scheme that updates derivatives at each split, enabling large learning rates and dynamic loss functions. Empirical results show that the proposed Decision Tree with Loss Function (DTLF) often outperforms CART and Extremely Randomized Trees across tasks and datasets, with publicly available code to facilitate adoption and extension into ensembles and hybrid models with neural networks. This work broadens the applicability of tree-based models to complex, differentiable loss landscapes and paves the way for interpretable, high-performing hybrids in structured data contexts.

Abstract

There are many approaches for training decision trees. This work introduces a novel gradient-based method for constructing decision trees that optimize arbitrary differentiable loss functions, overcoming the limitations of heuristic splitting rules. Unlike traditional approaches that rely on heuristic splitting rules, the proposed method refines predictions using the first and second derivatives of the loss function, enabling the optimization of complex tasks such as classification, regression, and survival analysis. We demonstrate the method's applicability to classification, regression, and survival analysis tasks, including those with censored data. Numerical experiments on both real and synthetic datasets compare the proposed method with traditional decision tree algorithms, such as CART, Extremely Randomized Trees, and SurvTree. The implementation of the method is publicly available, providing a practical tool for researchers and practitioners. This work advances the field of decision tree-based modeling, offering a more flexible and accurate approach for handling structured data and complex tasks. By leveraging gradient-based optimization, the proposed method bridges the gap between traditional decision trees and modern machine learning techniques, paving the way for further innovations in interpretable and high-performing models.

A novel gradient-based method for decision trees optimizing arbitrary differential loss functions

TL;DR

This work addresses the rigidity of traditional decision trees that rely on heuristic splits by introducing a gradient-based method that directly optimizes arbitrary differentiable loss functions. The approach refines leaf predictions using the first and second derivatives of the loss, enabling accurate handling of classification, regression, and survival analysis with censored data, and it supports integration with neural networks. The authors formalize the problem, derive closed-form leaf updates, and implement a linear-time split evaluation scheme that updates derivatives at each split, enabling large learning rates and dynamic loss functions. Empirical results show that the proposed Decision Tree with Loss Function (DTLF) often outperforms CART and Extremely Randomized Trees across tasks and datasets, with publicly available code to facilitate adoption and extension into ensembles and hybrid models with neural networks. This work broadens the applicability of tree-based models to complex, differentiable loss landscapes and paves the way for interpretable, high-performing hybrids in structured data contexts.

Abstract

There are many approaches for training decision trees. This work introduces a novel gradient-based method for constructing decision trees that optimize arbitrary differentiable loss functions, overcoming the limitations of heuristic splitting rules. Unlike traditional approaches that rely on heuristic splitting rules, the proposed method refines predictions using the first and second derivatives of the loss function, enabling the optimization of complex tasks such as classification, regression, and survival analysis. We demonstrate the method's applicability to classification, regression, and survival analysis tasks, including those with censored data. Numerical experiments on both real and synthetic datasets compare the proposed method with traditional decision tree algorithms, such as CART, Extremely Randomized Trees, and SurvTree. The implementation of the method is publicly available, providing a practical tool for researchers and practitioners. This work advances the field of decision tree-based modeling, offering a more flexible and accurate approach for handling structured data and complex tasks. By leveraging gradient-based optimization, the proposed method bridges the gap between traditional decision trees and modern machine learning techniques, paving the way for further innovations in interpretable and high-performing models.

Paper Structure

This paper contains 11 sections, 43 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The classification accuracy (ROC-AUC) as a function of the tree depth for different models trained on the BC dataset
  • Figure 2: The regression accuracy ($R^{2}$) as a function of the tree depth for CART and DTLF trained on six datasets
  • Figure 3: The regression accuracy ($R^{2}$) as a function of the parameter $k$ for CART, ERT, and DTLF trained on the Friedman1 dataset
  • Figure 4: The regression accuracy ($R^{2}$) dependent on the sample size $n$ for CART and DTLF trained on the Friedman1 dataset
  • Figure 5: The survival regression accuracy (C-index) as a function of the tree depth for the survival tree (SurvTree) and DTLF trained on six datasets
  • ...and 1 more figures