Table of Contents
Fetching ...

LLM Attention Transplant for Transfer Learning of Tabular Data Across Disparate Domains

Ibna Kowsar, Kazi F. Akhter, Manar D. Samad

TL;DR

This work tackles transfer learning for tabular data across domains with disjoint feature spaces, where traditional DL struggles. It introduces LATTLE, a lightweight framework that fine-tunes a small LLM on source data and transplant-selective attention weights into a gated Feature Tokenizer Transformer (gFTT) to enable cross-domain transfer without shared features or prompt engineering. Through experiments on ten cross-domain pairs from OpenML, LATTLE generally outperforms conventional ML, deep tabular models, and existing transfer-learning approaches, demonstrating effective cross-domain context learning via cross-attention. The approach reduces data and compute requirements while maintaining robust performance, suggesting a practical pathway for cross-domain tabular transfer using a compact LLM and a tabular transformer. The work also shows that transferring only the uppermost LLM layer’s weights is advantageous and that a single source dataset can suffice for transfer when cross-attention is employed.

Abstract

Transfer learning of tabular data is non-trivial due to heterogeneity in the feature space across disparate domains. The limited success of traditional deep learning in tabular knowledge transfer can be advanced by leveraging large language models (LLMs). However, the efficacy of LLMs often stagnates for mixed data types structured in tables due to the limitations of text prompts and in-context learning. We propose a lightweight transfer learning framework that fine-tunes an LLM using source tabular data and transplants the LLM's selective $key$ and $value$ projection weights into a gated feature tokenized transformer (gFTT) built for tabular data. The gFTT model with cross-domain attention is fine-tuned using target tabular data for transfer learning, eliminating the need for shared features, LLM prompt engineering, and large-scale pretrained models. Our experiments using ten pairs of source-target data sets and 12 baselines demonstrate the superiority of the proposed LLM-attention transplant for transfer learning (LATTLE) method over traditional ML models, state-of-the-art deep tabular architectures, and transfer learning models trained on thousands to billions of tabular samples. The proposed attention transfer demonstrates an effective solution to learning relationships between data tables using an LLM in a low-resource learning environment. The source code for the proposed method is publicly available.

LLM Attention Transplant for Transfer Learning of Tabular Data Across Disparate Domains

TL;DR

This work tackles transfer learning for tabular data across domains with disjoint feature spaces, where traditional DL struggles. It introduces LATTLE, a lightweight framework that fine-tunes a small LLM on source data and transplant-selective attention weights into a gated Feature Tokenizer Transformer (gFTT) to enable cross-domain transfer without shared features or prompt engineering. Through experiments on ten cross-domain pairs from OpenML, LATTLE generally outperforms conventional ML, deep tabular models, and existing transfer-learning approaches, demonstrating effective cross-domain context learning via cross-attention. The approach reduces data and compute requirements while maintaining robust performance, suggesting a practical pathway for cross-domain tabular transfer using a compact LLM and a tabular transformer. The work also shows that transferring only the uppermost LLM layer’s weights is advantageous and that a single source dataset can suffice for transfer when cross-attention is employed.

Abstract

Transfer learning of tabular data is non-trivial due to heterogeneity in the feature space across disparate domains. The limited success of traditional deep learning in tabular knowledge transfer can be advanced by leveraging large language models (LLMs). However, the efficacy of LLMs often stagnates for mixed data types structured in tables due to the limitations of text prompts and in-context learning. We propose a lightweight transfer learning framework that fine-tunes an LLM using source tabular data and transplants the LLM's selective and projection weights into a gated feature tokenized transformer (gFTT) built for tabular data. The gFTT model with cross-domain attention is fine-tuned using target tabular data for transfer learning, eliminating the need for shared features, LLM prompt engineering, and large-scale pretrained models. Our experiments using ten pairs of source-target data sets and 12 baselines demonstrate the superiority of the proposed LLM-attention transplant for transfer learning (LATTLE) method over traditional ML models, state-of-the-art deep tabular architectures, and transfer learning models trained on thousands to billions of tabular samples. The proposed attention transfer demonstrates an effective solution to learning relationships between data tables using an LLM in a low-resource learning environment. The source code for the proposed method is publicly available.

Paper Structure

This paper contains 19 sections, 3 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: The proposed LLM attention transplant for transfer learning (LATTLE) framework. The attention-related weights are transplanted from a lightweight LLM (DistilGPT2) fine-tuned by source data to a Gated Feature Tokenizer Transformer (gFTT) to be fine-tuned using target data. Emb. = Embedding, Cat. = categorical and Num. = numerical features.
  • Figure 2: Loss curves related to LLM-gFTT transfer learning between Credit-g and Diabetes data domains. (a) Supervised finetuning of LLM using the source data set (Credit-g); (b) Finetuning of gFTT using the target data set (Diabetes) after cross-attention weight transfer.
  • Figure 3: Effect of individual source data sets used in LLM pretraining on downstream transfer learning AUC scores. cmc the target data set.