Table of Contents
Fetching ...

GANDALF: Gated Adaptive Network for Deep Automated Learning of Features

Manu Joseph, Harsh Raj

TL;DR

GANDALF targets the gap between deep learning and gradient-boosted methods on tabular data by introducing Gated Feature Learning Units (GFLUs) with learnable feature masks and a gating mechanism. The architecture stacks multiple GFLUs to build a hierarchical feature representation, which is then passed through a lightweight MLP for prediction, achieving strong accuracy with fewer parameters and reduced compute. The work provides extensive public-benchmark validation (Tabular Benchmark and TabSurvey) and offers interpretability through aggregated feature masks and fidelity analyses with GradientSHAP and DeepLIFT. The authors also share an open-source PyTorch Tabular implementation under MIT, enabling practical adoption and further research on tabular deep learning models.

Abstract

We propose a novel high-performance, interpretable, and parameter \& computationally efficient deep learning architecture for tabular data, Gated Adaptive Network for Deep Automated Learning of Features (GANDALF). GANDALF relies on a new tabular processing unit with a gating mechanism and in-built feature selection called Gated Feature Learning Unit (GFLU) as a feature representation learning unit. We demonstrate that GANDALF outperforms or stays at-par with SOTA approaches like XGBoost, SAINT, FT-Transformers, etc. by experiments on multiple established public benchmarks. We have made available the code at github.com/manujosephv/pytorch_tabular under MIT License.

GANDALF: Gated Adaptive Network for Deep Automated Learning of Features

TL;DR

GANDALF targets the gap between deep learning and gradient-boosted methods on tabular data by introducing Gated Feature Learning Units (GFLUs) with learnable feature masks and a gating mechanism. The architecture stacks multiple GFLUs to build a hierarchical feature representation, which is then passed through a lightweight MLP for prediction, achieving strong accuracy with fewer parameters and reduced compute. The work provides extensive public-benchmark validation (Tabular Benchmark and TabSurvey) and offers interpretability through aggregated feature masks and fidelity analyses with GradientSHAP and DeepLIFT. The authors also share an open-source PyTorch Tabular implementation under MIT, enabling practical adoption and further research on tabular deep learning models.

Abstract

We propose a novel high-performance, interpretable, and parameter \& computationally efficient deep learning architecture for tabular data, Gated Adaptive Network for Deep Automated Learning of Features (GANDALF). GANDALF relies on a new tabular processing unit with a gating mechanism and in-built feature selection called Gated Feature Learning Unit (GFLU) as a feature representation learning unit. We demonstrate that GANDALF outperforms or stays at-par with SOTA approaches like XGBoost, SAINT, FT-Transformers, etc. by experiments on multiple established public benchmarks. We have made available the code at github.com/manujosephv/pytorch_tabular under MIT License.
Paper Structure (19 sections, 15 equations, 9 figures, 7 tables)

This paper contains 19 sections, 15 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: GRU vs GFLU - A comparison
  • Figure 2: Detailed View of the Gated Feature Learning Unit. $\otimes$ represents element wise multiplication and $\oplus$ addition
  • Figure 3: Benchmarking on Tabular Benchmarkleogrin_benchmark. The box plot of the normalized test scores across datasets show that GANDALF consistently achieve the best scores. The box plots of the number of parameters and MACs(Multiply-Accumulate Operations) show that GANDALF achieves the high performance with higher parameter efficiency and computational efficiency.
  • Figure 4: Benchmarking on TabSurveytabsurvey. The box plot of the normalized test scores across 4 datasets shows GANDALF being at par with GBDTs and better than other DL models.
  • Figure 5: Hyperparameter Study. a. The box plot of hyperparameter importances across 18 datasets. b - d. Histograms of hyperparameters where each bin is colored according to the average test scores in that bin. Green is higher and Red is lower.
  • ...and 4 more figures