Mambular: A Sequential Model for Tabular Deep Learning

Anton Frederik Thielmann; Manish Kumar; Christoph Weisser; Arik Reuter; Benjamin Säfken; Soheila Samiee

Mambular: A Sequential Model for Tabular Deep Learning

Anton Frederik Thielmann, Manish Kumar, Christoph Weisser, Arik Reuter, Benjamin Säfken, Soheila Samiee

TL;DR

The paper addresses the difficulty of applying deep learning to tabular data by introducing Mambular, a tabular adaptation of Mamba that treats features as a sequence processed by autoregressive state-space layers ($SSM$). It demonstrates competitive performance against strong baselines (e.g., CatBoost, FT-Transformer) and extends to distributional regression via MambularLSS, which improves CRPS over XGBoostLSS. Through ablations, it shows how pooling, kernel size, and feature ordering influence results, with Avg pooling and unidirectional processing often performing best. The work provides open-source code and highlights the potential of autoregressive tabular models to scale and flexibly incorporate new features without retraining.

Abstract

The analysis of tabular data has traditionally been dominated by gradient-boosted decision trees (GBDTs), known for their proficiency with mixed categorical and numerical features. However, recent deep learning innovations are challenging this dominance. This paper investigates the use of autoregressive state-space models for tabular data and compares their performance against established benchmark models. Additionally, we explore various adaptations of these models, including different pooling strategies, feature interaction mechanisms, and bi-directional processing techniques to understand their effectiveness for tabular data. Our findings indicate that interpreting features as a sequence and processing them and their interactions through structured state-space layers can lead to significant performance improvement. This research underscores the versatility of autoregressive models in tabular data analysis, positioning them as a promising alternative that could substantially enhance deep learning capabilities in this traditionally challenging area. The source code is available at https://github.com/basf/mamba-tabular.

Mambular: A Sequential Model for Tabular Deep Learning

TL;DR

). It demonstrates competitive performance against strong baselines (e.g., CatBoost, FT-Transformer) and extends to distributional regression via MambularLSS, which improves CRPS over XGBoostLSS. Through ablations, it shows how pooling, kernel size, and feature ordering influence results, with Avg pooling and unidirectional processing often performing best. The work provides open-source code and highlights the potential of autoregressive tabular models to scale and flexibly incorporate new features without retraining.

Abstract

Paper Structure (27 sections, 12 equations, 7 figures, 26 tables)

This paper contains 27 sections, 12 equations, 7 figures, 26 tables.

Introduction
Methodology
MambAttention
Experiments
Results
Distributional Regression
Ablation
Model Architecture
Sequence ordering
Limitations
Conclusion
Results
Datasets
Sequence ordering
California Housing
...and 12 more sections

Figures (7)

Figure 1: Generation of the input matrix that are fed through the Mamba blocks. The categorical features are tokenized and embedded similar to classical embeddings for language models. The numerical features are encoded and embedded via a simple linear layer. The final input matrix of the Mamba blocks are the concatenated embeddings $\mathbf{z} \in \mathbb{R}^{N \times J \times d}$ with embedding dimension $d$.
Figure 2: SSM updating step with recursive update of $h$: The hidden state is iteratively updated by going through the sequence (features) similar to a recurrent neural network. The final representation is generated as described in Equations 3-4.
Figure 3: The forward pass of a single sequence in the model. After embedding the inputs, the embeddings are passed to several Mamba blocks. The tabular head is a single task specific output layer. Before being passed to the Linear Layer, the contextualized embeddings are pooled via average pooling. For bidirectional processing a second block with a flipped sequence is used and the learnable matrices are not shared between the directions.
Figure 4: Critical difference diagram for all models on the benchmark datasets reported in Table \ref{['tab:all_results']}. The average ranks across tasks are shown in brackets next to each model. Horizontal lines indicate groups of models with no statistically significant differences in performance. Mambular achieves the second-best average rank, with no significant differences at the 5% level to the best performing model, CatBoost. Notably, the top four models do not exhibit statistically significant differences from one another. The critical differences are computed using the Conover-Friedman test pereira2015overview, as both average ranks and performance metrics across all datasets are available.
Figure 5: Critical difference diagram for best performing models on additional results reported in Table \ref{['tab:add_results']}. The average ranks across tasks are shown in brackets next to each model. Horizontal lines indicate groups of models with no statistically significant differences in performance. Mambular and CatBoost achieve the best average (identical) rank and significantly outperform the other models. The critical differences are computed using the Conover-Friedman test pereira2015overview, as both average ranks and performance metrics across all datasets are available.
...and 2 more figures

Mambular: A Sequential Model for Tabular Deep Learning

TL;DR

Abstract

Mambular: A Sequential Model for Tabular Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)