Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains

Kyungeun Lee; Ye Seul Sim; Hye-Seung Cho; Moonjung Eo; Suhee Yoon; Sanghyu Yoon; Woohyung Lim

Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains

Kyungeun Lee, Ye Seul Sim, Hye-Seung Cho, Moonjung Eo, Suhee Yoon, Sanghyu Yoon, Woohyung Lim

TL;DR

This work tackles the challenge of learning meaningful representations from tabular data with heterogeneous features by introducing a binning-based pretext task for self-supervised learning. The core idea is to train encoders to predict discretized bin indices $t_i^j \in \{1, \dots, T\}$ instead of reconstructing raw values, providing an inductive bias toward irregular, piecewise-constant mappings and standardizing feature distributions. Across 25 public tabular datasets, the proposed BinRecon and BinXent objectives yield consistent improvements in unsupervised representation learning and strong fine-tuning performance, often surpassing tree-based baselines and competitive deep models. The approach is architecture-agnostic and compatible with various input transformations, with code available publicly for replication and extension.

Abstract

The ability of deep networks to learn superior representations hinges on leveraging the proper inductive biases, considering the inherent properties of datasets. In tabular domains, it is critical to effectively handle heterogeneous features (both categorical and numerical) in a unified manner and to grasp irregular functions like piecewise constant functions. To address the challenges in the self-supervised learning framework, we propose a novel pretext task based on the classical binning method. The idea is straightforward: reconstructing the bin indices (either orders or classes) rather than the original values. This pretext task provides the encoder with an inductive bias to capture the irregular dependencies, mapping from continuous inputs to discretized bins, and mitigates the feature heterogeneity by setting all features to have category-type targets. Our empirical investigations ascertain several advantages of binning: capturing the irregular function, compatibility with encoder architecture and additional modifications, standardizing all features into equal sets, grouping similar values within a feature, and providing ordering information. Comprehensive evaluations across diverse tabular datasets corroborate that our method consistently improves tabular representation learning performance for a wide range of downstream tasks. The codes are available in https://github.com/kyungeun-lee/tabularbinning.

Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains

TL;DR

instead of reconstructing raw values, providing an inductive bias toward irregular, piecewise-constant mappings and standardizing feature distributions. Across 25 public tabular datasets, the proposed BinRecon and BinXent objectives yield consistent improvements in unsupervised representation learning and strong fine-tuning performance, often surpassing tree-based baselines and competitive deep models. The approach is architecture-agnostic and compatible with various input transformations, with code available publicly for replication and extension.

Abstract

Paper Structure (36 sections, 2 equations, 6 figures, 13 tables)

This paper contains 36 sections, 2 equations, 6 figures, 13 tables.

Introduction
Related works
Tabular deep learning:
Self-supervised learning in tabular domains:
Backgrounds
Input transformation:
SSL objectives:
Methods: Binning as a Pretext Task for Tabular SSL
Experiments
Comparison with the unsupervised methods: Linear evaluation results
Binary classification:
Multiclass classification:
Regression:
Comparison with the supervised methods: Fine-tuning results
Discussion
...and 21 more sections

Figures (6)

Figure 1: Binning as a pretext task. Bins are determined based on the distribution of the training dataset for each feature. The inputs are passed into the encoder network, then the decoder network predicts the bin indices which can be ordinal when the pretext task is the regression or nominal when the pretext task is the classification.
Figure 2: An illustration of two methods to generate the replacing vectors for masked features.
Figure 3: An example of binning (Dataset: Wine Quality cortez2009modeling). In the example, we set $T$ as 10. For each feature, we implement the binning to include the same number of observations based on the training dataset. Finally, we use the binning indices as the targets for auto-encoding-based SSL. When we regard the bin indices as the classes without order information, the binning indices are converted into the one-hot vectors.
Figure 4: Visualization analysis using HO dataset. For better interpretability, we implement PCA for the learned representation vectors based on the different objective functions, plotting the first two principal components. Colors denote the bin indices of each sample.
Figure 5: Relative performance when we change the binning method to the equal-width from the quantiles. When the values are positive, the quantile-based binning is better than the equal-width binning. When the values are negative, vice versa. In particular, for regression tasks, the quantile-based binning is much better than the equal-width binning.
...and 1 more figures

Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains

TL;DR

Abstract

Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains

Authors

TL;DR

Abstract

Table of Contents

Figures (6)