Table of Contents
Fetching ...

KMLP: A Scalable Hybrid Architecture for Web-Scale Tabular Data Modeling

Mingming Zhang, Pengfei Shi, Zhiqing Xiao, Feng Zhao, Guandong Sun, Yulin Kang, Ruizhe Gao, Ningtao Wang, Xing Fu, Weiqiang Wang, Junbo Zhao

TL;DR

Experiments on public benchmarks and an industrial dataset with billions of samples show KMLP achieves state-of-the-art performance, with advantages over baselines like GBDTs increasing at larger scales, validating KMLP as a scalable deep learning paradigm for large-scale web tabular data.

Abstract

Predictive modeling on web-scale tabular data with billions of instances and hundreds of heterogeneous numerical features faces significant scalability challenges. These features exhibit anisotropy, heavy-tailed distributions, and non-stationarity, creating bottlenecks for models like Gradient Boosting Decision Trees and requiring laborious manual feature engineering. We introduce KMLP, a hybrid deep architecture integrating a shallow Kolmogorov-Arnold Network (KAN) front-end with a Gated Multilayer Perceptron (gMLP) backbone. The KAN front-end uses learnable activation functions to automatically model complex non-linear transformations for each feature, while the gMLP backbone captures high-order interactions. Experiments on public benchmarks and an industrial dataset with billions of samples show KMLP achieves state-of-the-art performance, with advantages over baselines like GBDTs increasing at larger scales, validating KMLP as a scalable deep learning paradigm for large-scale web tabular data.

KMLP: A Scalable Hybrid Architecture for Web-Scale Tabular Data Modeling

TL;DR

Experiments on public benchmarks and an industrial dataset with billions of samples show KMLP achieves state-of-the-art performance, with advantages over baselines like GBDTs increasing at larger scales, validating KMLP as a scalable deep learning paradigm for large-scale web tabular data.

Abstract

Predictive modeling on web-scale tabular data with billions of instances and hundreds of heterogeneous numerical features faces significant scalability challenges. These features exhibit anisotropy, heavy-tailed distributions, and non-stationarity, creating bottlenecks for models like Gradient Boosting Decision Trees and requiring laborious manual feature engineering. We introduce KMLP, a hybrid deep architecture integrating a shallow Kolmogorov-Arnold Network (KAN) front-end with a Gated Multilayer Perceptron (gMLP) backbone. The KAN front-end uses learnable activation functions to automatically model complex non-linear transformations for each feature, while the gMLP backbone captures high-order interactions. Experiments on public benchmarks and an industrial dataset with billions of samples show KMLP achieves state-of-the-art performance, with advantages over baselines like GBDTs increasing at larger scales, validating KMLP as a scalable deep learning paradigm for large-scale web tabular data.
Paper Structure (20 sections, 8 equations, 3 figures, 10 tables)

This paper contains 20 sections, 8 equations, 3 figures, 10 tables.

Figures (3)

  • Figure 1: Overview of the KMLP-QTL structure. Tabular data features are first processed by QTL for fine-grained numerical representation. The data then flows through the KAN layer to manage feature heterogeneity and complex interactions, followed by stacked gMLP modules to capture deep non-linear interactions. Batch Normalization and Dropout are included in intermediate layers to enhance performance and stability.
  • Figure 2: Quantile Transformation with Linear interpolation
  • Figure 3: Data Scale Effects. The KS (left) and AUC (right) of LightGBM and KMLP with different sizes of training sets. When the scale of the training data set is relatively small, LightGBM still performs better than KMLP. However, as the scale of the data set increases, KMLP outperformed LightGBM, achieving a 1.76 improvement in the KS value.