Table of Contents
Fetching ...

Team up GBDTs and DNNs: Advancing Efficient and Effective Tabular Prediction with Tree-hybrid MLPs

Jiahuan Yan, Jintai Chen, Qianxing Wang, Danny Z. Chen, Jian Wu

TL;DR

The paper tackles the dichotomy between GBDTs and DNNs for tabular data by introducing T-MLP, a tree-hybrid simple MLP that combines a GBDT-derived feature gate with a sparsity-driven MLP core. Key components include a tensorized GBDT feature gate, a compact MLP block with SGU gating, and user-controlled pruning to enforce sparsity, all trained end-to-end with back-propagation and optionally ensembled in parallel. Across 88 datasets and multiple benchmarks, T-MLP achieves competitive results with substantially reduced training time and compact model size, often matching or surpassing strongly tuned DNNs and being competitive with tuned GBDTs. The work also provides interpretability insights and an open-source implementation to promote broader adoption in economical tabular prediction.

Abstract

Tabular datasets play a crucial role in various applications. Thus, developing efficient, effective, and widely compatible prediction algorithms for tabular data is important. Currently, two prominent model types, Gradient Boosted Decision Trees (GBDTs) and Deep Neural Networks (DNNs), have demonstrated performance advantages on distinct tabular prediction tasks. However, selecting an effective model for a specific tabular dataset is challenging, often demanding time-consuming hyperparameter tuning. To address this model selection dilemma, this paper proposes a new framework that amalgamates the advantages of both GBDTs and DNNs, resulting in a DNN algorithm that is as efficient as GBDTs and is competitively effective regardless of dataset preferences for GBDTs or DNNs. Our idea is rooted in an observation that deep learning (DL) offers a larger parameter space that can represent a well-performing GBDT model, yet the current back-propagation optimizer struggles to efficiently discover such optimal functionality. On the other hand, during GBDT development, hard tree pruning, entropy-driven feature gate, and model ensemble have proved to be more adaptable to tabular data. By combining these key components, we present a Tree-hybrid simple MLP (T-MLP). In our framework, a tensorized, rapidly trained GBDT feature gate, a DNN architecture pruning approach, as well as a vanilla back-propagation optimizer collaboratively train a randomly initialized MLP model. Comprehensive experiments show that T-MLP is competitive with extensively tuned DNNs and GBDTs in their dominating tabular benchmarks (88 datasets) respectively, all achieved with compact model storage and significantly reduced training duration.

Team up GBDTs and DNNs: Advancing Efficient and Effective Tabular Prediction with Tree-hybrid MLPs

TL;DR

The paper tackles the dichotomy between GBDTs and DNNs for tabular data by introducing T-MLP, a tree-hybrid simple MLP that combines a GBDT-derived feature gate with a sparsity-driven MLP core. Key components include a tensorized GBDT feature gate, a compact MLP block with SGU gating, and user-controlled pruning to enforce sparsity, all trained end-to-end with back-propagation and optionally ensembled in parallel. Across 88 datasets and multiple benchmarks, T-MLP achieves competitive results with substantially reduced training time and compact model size, often matching or surpassing strongly tuned DNNs and being competitive with tuned GBDTs. The work also provides interpretability insights and an open-source implementation to promote broader adoption in economical tabular prediction.

Abstract

Tabular datasets play a crucial role in various applications. Thus, developing efficient, effective, and widely compatible prediction algorithms for tabular data is important. Currently, two prominent model types, Gradient Boosted Decision Trees (GBDTs) and Deep Neural Networks (DNNs), have demonstrated performance advantages on distinct tabular prediction tasks. However, selecting an effective model for a specific tabular dataset is challenging, often demanding time-consuming hyperparameter tuning. To address this model selection dilemma, this paper proposes a new framework that amalgamates the advantages of both GBDTs and DNNs, resulting in a DNN algorithm that is as efficient as GBDTs and is competitively effective regardless of dataset preferences for GBDTs or DNNs. Our idea is rooted in an observation that deep learning (DL) offers a larger parameter space that can represent a well-performing GBDT model, yet the current back-propagation optimizer struggles to efficiently discover such optimal functionality. On the other hand, during GBDT development, hard tree pruning, entropy-driven feature gate, and model ensemble have proved to be more adaptable to tabular data. By combining these key components, we present a Tree-hybrid simple MLP (T-MLP). In our framework, a tensorized, rapidly trained GBDT feature gate, a DNN architecture pruning approach, as well as a vanilla back-propagation optimizer collaboratively train a randomly initialized MLP model. Comprehensive experiments show that T-MLP is competitive with extensively tuned DNNs and GBDTs in their dominating tabular benchmarks (88 datasets) respectively, all achieved with compact model storage and significantly reduced training duration.
Paper Structure (22 sections, 4 equations, 4 figures, 14 tables)

This paper contains 22 sections, 4 equations, 4 figures, 14 tables.

Figures (4)

  • Figure 1: Our proposed T-MLP vs. existing tabular prediction approaches: GBDTs and DNNs. (a) GBDTs are classical non-deep-learning models for tabular prediction. (b) DNNs are emerging promising methods especially for large-scale, complex, cross-table scenarios. (c) T-MLP is a hybrid framework that integrates the strengths of both GBDTs and DNNs, accomplished via GBDT feature gate tensorization, MLP framework pruning, simple block ensemble, and end-to-end back-propagation. It yields competitive results on both DNN- and GBDT-favored datasets, with a rapid development process and compact model size.
  • Figure 2: The winning rates of GBDTs and DNNs on three benchmarks, which represent the proportion of each framework achieving the best performance in the benchmarks. It exhibits varying framework preferences among the datasets used in different tabular prediction works.
  • Figure 3: Performance variation plots on the Adult and Year datasets with respect to variations of T-MLP sparsity. All the best results are achieved with suitable sparsity.
  • Figure 4: Decision boundary visualization of FT-Transformer (FT-T), XGBoost, and a single-block T-MLP on the Bioresponse and Credit-g datasets, using two most important features. Different colors represent distinct categories, while the varying shades of colors indicate the predicted probabilities.