Table of Contents
Fetching ...

Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains

Kyungeun Lee, Ye Seul Sim, Hye-Seung Cho, Moonjung Eo, Suhee Yoon, Sanghyu Yoon, Woohyung Lim

TL;DR

This work tackles the challenge of learning meaningful representations from tabular data with heterogeneous features by introducing a binning-based pretext task for self-supervised learning. The core idea is to train encoders to predict discretized bin indices $t_i^j \in \{1, \dots, T\}$ instead of reconstructing raw values, providing an inductive bias toward irregular, piecewise-constant mappings and standardizing feature distributions. Across 25 public tabular datasets, the proposed BinRecon and BinXent objectives yield consistent improvements in unsupervised representation learning and strong fine-tuning performance, often surpassing tree-based baselines and competitive deep models. The approach is architecture-agnostic and compatible with various input transformations, with code available publicly for replication and extension.

Abstract

The ability of deep networks to learn superior representations hinges on leveraging the proper inductive biases, considering the inherent properties of datasets. In tabular domains, it is critical to effectively handle heterogeneous features (both categorical and numerical) in a unified manner and to grasp irregular functions like piecewise constant functions. To address the challenges in the self-supervised learning framework, we propose a novel pretext task based on the classical binning method. The idea is straightforward: reconstructing the bin indices (either orders or classes) rather than the original values. This pretext task provides the encoder with an inductive bias to capture the irregular dependencies, mapping from continuous inputs to discretized bins, and mitigates the feature heterogeneity by setting all features to have category-type targets. Our empirical investigations ascertain several advantages of binning: capturing the irregular function, compatibility with encoder architecture and additional modifications, standardizing all features into equal sets, grouping similar values within a feature, and providing ordering information. Comprehensive evaluations across diverse tabular datasets corroborate that our method consistently improves tabular representation learning performance for a wide range of downstream tasks. The codes are available in https://github.com/kyungeun-lee/tabularbinning.

Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains

TL;DR

This work tackles the challenge of learning meaningful representations from tabular data with heterogeneous features by introducing a binning-based pretext task for self-supervised learning. The core idea is to train encoders to predict discretized bin indices instead of reconstructing raw values, providing an inductive bias toward irregular, piecewise-constant mappings and standardizing feature distributions. Across 25 public tabular datasets, the proposed BinRecon and BinXent objectives yield consistent improvements in unsupervised representation learning and strong fine-tuning performance, often surpassing tree-based baselines and competitive deep models. The approach is architecture-agnostic and compatible with various input transformations, with code available publicly for replication and extension.

Abstract

The ability of deep networks to learn superior representations hinges on leveraging the proper inductive biases, considering the inherent properties of datasets. In tabular domains, it is critical to effectively handle heterogeneous features (both categorical and numerical) in a unified manner and to grasp irregular functions like piecewise constant functions. To address the challenges in the self-supervised learning framework, we propose a novel pretext task based on the classical binning method. The idea is straightforward: reconstructing the bin indices (either orders or classes) rather than the original values. This pretext task provides the encoder with an inductive bias to capture the irregular dependencies, mapping from continuous inputs to discretized bins, and mitigates the feature heterogeneity by setting all features to have category-type targets. Our empirical investigations ascertain several advantages of binning: capturing the irregular function, compatibility with encoder architecture and additional modifications, standardizing all features into equal sets, grouping similar values within a feature, and providing ordering information. Comprehensive evaluations across diverse tabular datasets corroborate that our method consistently improves tabular representation learning performance for a wide range of downstream tasks. The codes are available in https://github.com/kyungeun-lee/tabularbinning.
Paper Structure (36 sections, 2 equations, 6 figures, 13 tables)

This paper contains 36 sections, 2 equations, 6 figures, 13 tables.

Figures (6)

  • Figure 1: Binning as a pretext task. Bins are determined based on the distribution of the training dataset for each feature. The inputs are passed into the encoder network, then the decoder network predicts the bin indices which can be ordinal when the pretext task is the regression or nominal when the pretext task is the classification.
  • Figure 2: An illustration of two methods to generate the replacing vectors for masked features.
  • Figure 3: An example of binning (Dataset: Wine Quality cortez2009modeling). In the example, we set $T$ as 10. For each feature, we implement the binning to include the same number of observations based on the training dataset. Finally, we use the binning indices as the targets for auto-encoding-based SSL. When we regard the bin indices as the classes without order information, the binning indices are converted into the one-hot vectors.
  • Figure 4: Visualization analysis using HO dataset. For better interpretability, we implement PCA for the learned representation vectors based on the different objective functions, plotting the first two principal components. Colors denote the bin indices of each sample.
  • Figure 5: Relative performance when we change the binning method to the equal-width from the quantiles. When the values are positive, the quantile-based binning is better than the equal-width binning. When the values are negative, vice versa. In particular, for regression tasks, the quantile-based binning is much better than the equal-width binning.
  • ...and 1 more figures