GANDALF: Gated Adaptive Network for Deep Automated Learning of Features

Manu Joseph; Harsh Raj

GANDALF: Gated Adaptive Network for Deep Automated Learning of Features

Manu Joseph, Harsh Raj

TL;DR

GANDALF targets the gap between deep learning and gradient-boosted methods on tabular data by introducing Gated Feature Learning Units (GFLUs) with learnable feature masks and a gating mechanism. The architecture stacks multiple GFLUs to build a hierarchical feature representation, which is then passed through a lightweight MLP for prediction, achieving strong accuracy with fewer parameters and reduced compute. The work provides extensive public-benchmark validation (Tabular Benchmark and TabSurvey) and offers interpretability through aggregated feature masks and fidelity analyses with GradientSHAP and DeepLIFT. The authors also share an open-source PyTorch Tabular implementation under MIT, enabling practical adoption and further research on tabular deep learning models.

Abstract

We propose a novel high-performance, interpretable, and parameter \& computationally efficient deep learning architecture for tabular data, Gated Adaptive Network for Deep Automated Learning of Features (GANDALF). GANDALF relies on a new tabular processing unit with a gating mechanism and in-built feature selection called Gated Feature Learning Unit (GFLU) as a feature representation learning unit. We demonstrate that GANDALF outperforms or stays at-par with SOTA approaches like XGBoost, SAINT, FT-Transformers, etc. by experiments on multiple established public benchmarks. We have made available the code at github.com/manujosephv/pytorch_tabular under MIT License.

GANDALF: Gated Adaptive Network for Deep Automated Learning of Features

TL;DR

Abstract

Paper Structure (19 sections, 15 equations, 9 figures, 7 tables)

This paper contains 19 sections, 15 equations, 9 figures, 7 tables.

Introduction
Related Work
Gated Adaptive Network for Deep Automated Learning of Features (GANDALF)
Gated Feature Learning Units (GFLU)
Feature Selection:
Gating Mechanism
Network Architecture and Initialization
Interpretability
Experiments and Analysis
Comparison with Public Benchmarks
Tabular Benchmark
Tabsurvey Benchmark
Results
Hyperparameter Study
Interpretability
...and 4 more sections

Figures (9)

Figure 1: GRU vs GFLU - A comparison
Figure 2: Detailed View of the Gated Feature Learning Unit. $\otimes$ represents element wise multiplication and $\oplus$ addition
Figure 3: Benchmarking on Tabular Benchmarkleogrin_benchmark. The box plot of the normalized test scores across datasets show that GANDALF consistently achieve the best scores. The box plots of the number of parameters and MACs(Multiply-Accumulate Operations) show that GANDALF achieves the high performance with higher parameter efficiency and computational efficiency.
Figure 4: Benchmarking on TabSurveytabsurvey. The box plot of the normalized test scores across 4 datasets shows GANDALF being at par with GBDTs and better than other DL models.
Figure 5: Hyperparameter Study. a. The box plot of hyperparameter importances across 18 datasets. b - d. Histograms of hyperparameters where each bin is colored according to the average test scores in that bin. Green is higher and Red is lower.
...and 4 more figures

GANDALF: Gated Adaptive Network for Deep Automated Learning of Features

TL;DR

Abstract

GANDALF: Gated Adaptive Network for Deep Automated Learning of Features

Authors

TL;DR

Abstract

Table of Contents

Figures (9)