SwitchTab: Switched Autoencoders Are Effective Tabular Learners

Jing Wu; Suiyao Chen; Qi Zhao; Renat Sergazinov; Chen Li; Shengjie Liu; Chongchao Zhao; Tianpei Xie; Hanqing Guo; Cheng Ji; Daniel Cociorva; Hakan Brunzel

SwitchTab: Switched Autoencoders Are Effective Tabular Learners

Jing Wu, Suiyao Chen, Qi Zhao, Renat Sergazinov, Chen Li, Shengjie Liu, Chongchao Zhao, Tianpei Xie, Hanqing Guo, Cheng Ji, Daniel Cociorva, Hakan Brunzel

TL;DR

SwitchTab is introduced, a novel self-supervised method specifically designed to capture latent dependencies in tabular data that leverages an asymmetric encoder-decoder framework to decouple mutual and salient features among data pairs, resulting in more representative embeddings.

Abstract

Self-supervised representation learning methods have achieved significant success in computer vision and natural language processing, where data samples exhibit explicit spatial or semantic dependencies. However, applying these methods to tabular data is challenging due to the less pronounced dependencies among data samples. In this paper, we address this limitation by introducing SwitchTab, a novel self-supervised method specifically designed to capture latent dependencies in tabular data. SwitchTab leverages an asymmetric encoder-decoder framework to decouple mutual and salient features among data pairs, resulting in more representative embeddings. These embeddings, in turn, contribute to better decision boundaries and lead to improved results in downstream tasks. To validate the effectiveness of SwitchTab, we conduct extensive experiments across various domains involving tabular data. The results showcase superior performance in end-to-end prediction tasks with fine-tuning. Moreover, we demonstrate that pre-trained salient embeddings can be utilized as plug-and-play features to enhance the performance of various traditional classification methods (e.g., Logistic Regression, XGBoost, etc.). Lastly, we highlight the capability of SwitchTab to create explainable representations through visualization of decoupled mutual and salient features in the latent space.

SwitchTab: Switched Autoencoders Are Effective Tabular Learners

TL;DR

Abstract

Paper Structure (26 sections, 3 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 26 sections, 3 equations, 4 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Models for Tabular Data Learning and Prediction
Traditional Models.
Deep Learning Models.
Self-supervised Representation Learning
Feature Decoupling
Method
Feature Corruption
Self-supervised Learning
Pre-training with Labels
Downstream Fine-tuning
Experiments and Results
Preliminaries for Experiments
Datasets.
...and 11 more sections

Figures (4)

Figure 1: Given a pair of images, a person can easily distinguish the salient digits and mutual background due to the well-structured spatial relationships. However, it becomes challenging to distinguish a pair of tabular samples. For instance, feature City may be salient between data points "Chicago" and "New York" for word counts, however, still sharing some latent mutual information (e.g., big cities), making it challenging for decoupling. Note that this decoupling process is for illustration only. In the implementation, all the decoupled samples are computed in the feature space.
Figure 2: Block diagram of the proposed self-supervised learning framework. (1) Two different samples $x_1$ and $x_2$ are randomly corrupted and encoded into feature vectors $z_1$ and $z_2$ through encoder $f$. (2) feature vectors $z_1$ and $z_2$ are decoupled into mutual and salient features by two different projectors $p_m$ and $p_s$, respectively. (3) Mutual and salient features are combined and reconstructed by a decoder $d$ where the salient feature dominates the sample type and the mutual feature provides common information that is switchable among two samples.
Figure 3: Block diagram of the proposed pre-training framework with labels. (1) Supervised learning: latent feature vectors $z_1$ and $z_2$ are passed through a multi-layer perceptron (MLP) to predict labels. The cross-entropy loss is computed based on the predicted labels and the true labels. (2) Self-supervised learning: reconstructed (recovered and switched) data and original encoded data are used for computing the mean square error (MSE).
Figure 4: t-SNE visualization of mutual and salient features in two-dimensional space.

SwitchTab: Switched Autoencoders Are Effective Tabular Learners

TL;DR

Abstract

SwitchTab: Switched Autoencoders Are Effective Tabular Learners

Authors

TL;DR

Abstract

Table of Contents

Figures (4)