Unlocking the Transferability of Tokens in Deep Models for Tabular Data
Qi-Le Zhou, Han-Jia Ye, Le-Ye Wang, De-Chuan Zhan
TL;DR
The paper tackles transfer learning for deep tabular models when upstream and downstream feature spaces differ. It introduces TabToken, a token-centric approach that enriches feature embeddings with semantics through a Contrastive Token Regularization (CTR) loss during semantic pre-training. During fine-tuning, overlapping tokens are kept fixed while unseen feature tokens are initialized by averaging and regularized to preserve semantic structure, enabling effective transfer with limited data, formalized by the objective $ \min_{f_0=g_0\circ h_0} \sum_{i=1}^N \ell(g_0(h_0(\boldsymbol{x}_{i,:})), y_i) + \beta \Omega(\{\boldsymbol{T}_i\})$. Empirical results on 10 tabular datasets show strong cross-feature transfer performance and improved discriminative power for standard classification and regression tasks, highlighting the practical impact of enhancing feature token semantics in tabular deep learning.
Abstract
Fine-tuning a pre-trained deep neural network has become a successful paradigm in various machine learning tasks. However, such a paradigm becomes particularly challenging with tabular data when there are discrepancies between the feature sets of pre-trained models and the target tasks. In this paper, we propose TabToken, a method aims at enhancing the quality of feature tokens (i.e., embeddings of tabular features). TabToken allows for the utilization of pre-trained models when the upstream and downstream tasks share overlapping features, facilitating model fine-tuning even with limited training examples. Specifically, we introduce a contrastive objective that regularizes the tokens, capturing the semantics within and across features. During the pre-training stage, the tokens are learned jointly with top-layer deep models such as transformer. In the downstream task, tokens of the shared features are kept fixed while TabToken efficiently fine-tunes the remaining parts of the model. TabToken not only enables knowledge transfer from a pre-trained model to tasks with heterogeneous features, but also enhances the discriminative ability of deep tabular models in standard classification and regression tasks.
