Hybrid Autoencoders for Tabular Data: Leveraging Model-Based Augmentation in Low-Label Settings
Erel Naor, Ofir Lindenbaum
TL;DR
TANDEM addresses the challenge of learning from limited labeled data on tabular sources by introducing a hybrid self-supervised autoencoder that couples a neural encoder with an Oblivious Soft Decision Tree (OSDT) encoder. Each encoder receives a sample-specific masked view produced by its own gating network, and both decode through a shared decoder with cross-view reconstruction and latent-space alignment losses guiding joint training. Inference relies solely on the neural encoder, preserving SSL compatibility, while spectral analysis reveals complementary inductive biases: the neural path emphasizes smoothing high-frequency content less suited to tabular structure, whereas the OSDT path captures sharp, localized patterns. Empirically, TANDEM achieves state-of-the-art performance on low-label classification and regression across diverse tabular datasets, with ablations confirming the necessity of both encoders and gating, and frequency-decomposition analyses highlighting the model-based augmentation advantages.
Abstract
Deep neural networks often under-perform on tabular data due to their sensitivity to irrelevant features and a spectral bias toward smooth, low-frequency functions. These limitations hinder their ability to capture the sharp, high-frequency signals that often define tabular structure, especially under limited labeled samples. While self-supervised learning (SSL) offers promise in such settings, it remains challenging in tabular domains due to the lack of effective data augmentations. We propose a hybrid autoencoder that combines a neural encoder with an oblivious soft decision tree (OSDT) encoder, each guided by its own stochastic gating network that performs sample-specific feature selection. Together, these structurally different encoders and model-specific gating networks implement model-based augmentation, producing complementary input views tailored to each architecture. The two encoders, trained with a shared decoder and cross-reconstruction loss, learn distinct yet aligned representations that reflect their respective inductive biases. During training, the OSDT encoder (robust to noise and effective at modeling localized, high-frequency structure) guides the neural encoder toward representations more aligned with tabular data. At inference, only the neural encoder is used, preserving flexibility and SSL compatibility. Spectral analysis highlights the distinct inductive biases of each encoder. Our method achieves consistent gains in low-label classification and regression across diverse tabular datasets, outperforming deep and tree-based supervised baselines.
