Deep Tabular Representation Corrector

Hangting Ye; Peng Wang; Wei Fan; Xiaozhuang Song; He Zhao; Dandan Gun; Yi Chang

Deep Tabular Representation Corrector

Hangting Ye, Peng Wang, Wei Fan, Xiaozhuang Song, He Zhao, Dandan Gun, Yi Chang

Abstract

Tabular data have been playing a mostly important role in diverse real-world fields, such as healthcare, engineering, finance, etc. The recent success of deep learning has fostered many deep networks (e.g., Transformer, ResNet) based tabular learning methods. Generally, existing deep tabular machine learning methods are along with the two paradigms, i.e., in-learning and pre-learning. In-learning methods need to train networks from scratch or impose extra constraints to regulate the representations which nonetheless train multiple tasks simultaneously and make learning more difficult, while pre-learning methods design several pretext tasks for pre-training and then conduct task-specific fine-tuning, which however need much extra training effort with prior knowledge. In this paper, we introduce a novel deep Tabular Representation Corrector, TRC, to enhance any trained deep tabular model's representations without altering its parameters in a model-agnostic manner. Specifically, targeting the representation shift and representation redundancy that hinder prediction, we propose two tasks, i.e., (i) Tabular Representation Re-estimation, that involves training a shift estimator to calculate the inherent shift of tabular representations to subsequently mitigate it, thereby re-estimating the representations and (ii) Tabular Space Mapping, that transforms the above re-estimated representations into a light-embedding vector space via a coordinate estimator while preserves crucial predictive information to minimize redundancy. The two tasks jointly enhance the representations of deep tabular models without touching on the original models thus enjoying high efficiency. Finally, we conduct extensive experiments on state-of-the-art deep tabular machine learning models coupled with TRC on various tabular benchmarks which have shown consistent superiority.

Deep Tabular Representation Corrector

Abstract

Paper Structure (29 sections, 20 equations, 22 figures, 14 tables, 1 algorithm)

This paper contains 29 sections, 20 equations, 22 figures, 14 tables, 1 algorithm.

Introduction
Related Work
In-learning for Tabular Data
Pre-learning for Tabular Data
Problem Formulations
Methodology
Motivation
Tabular Representation Re-estimation
Tabular Space Mapping
Framework Overview
Experiment & Analysis
Experimental Setup
Main Results
Further Analysis
Conclusion
...and 14 more sections

Figures (22)

Figure 1: The performance of deep tabular models with varying noise levels in observations. FT-Transformer, MLP, and DCN2 indicate different deep tabular models. For regression tasks, lower RMSE is better, and for classification tasks, higher accuracy is better.
Figure 2: The relation between model performance and the corresponding SVE values of representations. For each subfigure, we conduct experiment on multiple random seeds. We find that deep tabular models with higher SVE often yield lower performance.
Figure 3: The framework of Tabular Representation Corrector (TRC). Subfigure (a) illustrates the overall TRC framework, subfigure (b) presents the training process for the shift estimator of TRC, and subfigure (c) presents the training process for the coordinate estimator of TRC. Here, $z_i = G_f(x_i;\theta_f)$ is the output of any trained backbone. $z_i$ would be enhanced via two tasks. In subfigure (b), the explicitly perturbed samples are only used for training the shift estimator. During the test stage, we feed the representations extracted by the existing trained deep tabular model, which are from the test dataset (notably, we do not add any noise to the test dataset), into the shift estimator followed by the coordinate estimator to achieve the calibrated representations of test samples.
Figure 4: Performance improvement of TRC over various backbone models on large scale datasets.
Figure 5: Comparison of the $L_1$ norm of the shift information estimated by TRC on data samples with different gradients. For each data sample, the gradient is computed by $\|\nabla_{\theta_f}\mathcal{L}(x, y;\theta)\|_1$, where $\theta$ is the parameters of the trained deep tabular model. We partition the $L_1$ norm of gradients into 10 equally spaced intervals from low to high and count the number of samples in each interval.
...and 17 more figures

Deep Tabular Representation Corrector

Abstract

Deep Tabular Representation Corrector

Authors

Abstract

Table of Contents

Figures (22)