Table of Contents
Fetching ...

Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning

Yi Cheng, Renjun Hu, Haochao Ying, Xing Shi, Jian Wu, Wei Lin

TL;DR

This work tackles the question of what inductive biases enable deep learning on tabular data, proposing that arithmetic feature interaction is necessary for effective modeling. It introduces AMFormer, a transformer-based architecture that uses parallel additive and multiplicative attention and prompt-based optimization to engineer arithmetical interactions among features. Empirical results on synthetic data show substantial gains in fine-grained modeling, data efficiency, and generalization, while real-world experiments across four tabular datasets corroborate the approach and show consistent improvements over strong baselines. The study suggests a strong inductive bias for deep tabular learning and provides a practical, scalable module that can enhance existing transformer-based models.

Abstract

Until recently, the question of the effective inductive bias of deep models on tabular data has remained unanswered. This paper investigates the hypothesis that arithmetic feature interaction is necessary for deep tabular learning. To test this point, we create a synthetic tabular dataset with a mild feature interaction assumption and examine a modified transformer architecture enabling arithmetical feature interactions, referred to as AMFormer. Results show that AMFormer outperforms strong counterparts in fine-grained tabular data modeling, data efficiency in training, and generalization. This is attributed to its parallel additive and multiplicative attention operators and prompt-based optimization, which facilitate the separation of tabular samples in an extended space with arithmetically-engineered features. Our extensive experiments on real-world data also validate the consistent effectiveness, efficiency, and rationale of AMFormer, suggesting it has established a strong inductive bias for deep learning on tabular data. Code is available at https://github.com/aigc-apps/AMFormer.

Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning

TL;DR

This work tackles the question of what inductive biases enable deep learning on tabular data, proposing that arithmetic feature interaction is necessary for effective modeling. It introduces AMFormer, a transformer-based architecture that uses parallel additive and multiplicative attention and prompt-based optimization to engineer arithmetical interactions among features. Empirical results on synthetic data show substantial gains in fine-grained modeling, data efficiency, and generalization, while real-world experiments across four tabular datasets corroborate the approach and show consistent improvements over strong baselines. The study suggests a strong inductive bias for deep tabular learning and provides a practical, scalable module that can enhance existing transformer-based models.

Abstract

Until recently, the question of the effective inductive bias of deep models on tabular data has remained unanswered. This paper investigates the hypothesis that arithmetic feature interaction is necessary for deep tabular learning. To test this point, we create a synthetic tabular dataset with a mild feature interaction assumption and examine a modified transformer architecture enabling arithmetical feature interactions, referred to as AMFormer. Results show that AMFormer outperforms strong counterparts in fine-grained tabular data modeling, data efficiency in training, and generalization. This is attributed to its parallel additive and multiplicative attention operators and prompt-based optimization, which facilitate the separation of tabular samples in an extended space with arithmetically-engineered features. Our extensive experiments on real-world data also validate the consistent effectiveness, efficiency, and rationale of AMFormer, suggesting it has established a strong inductive bias for deep learning on tabular data. Code is available at https://github.com/aigc-apps/AMFormer.
Paper Structure (12 sections, 4 equations, 5 figures, 9 tables)

This paper contains 12 sections, 4 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Results on synthetic data. The $+x\%$ in the figure are the relative improvement of AMFormer over Transformer.
  • Figure 2: The overview of AMFormer. $L$ is the layer number.
  • Figure 3: Impacts of layer number $L$ and parameter $k$.
  • Figure 4: Impact of layer number L on MI and HC.
  • Figure 5: Parameter K on MI and HC.