LLM Embeddings Improve Test-time Adaptation to Tabular $Y|X$-Shifts

Yibo Zeng; Jiashuo Liu; Henry Lam; Hongseok Namkoong

LLM Embeddings Improve Test-time Adaptation to Tabular $Y|X$-Shifts

Yibo Zeng, Jiashuo Liu, Henry Lam, Hongseok Namkoong

TL;DR

This paper addresses $Y|X$-shifts in tabular data by proposing serialization-based LLM embeddings to produce informative representations for test-time adaptation with limited target labels. It combines these embeddings with optional domain information and trains a shallow neural network, evaluating multiple target-adaptation strategies (in-context domain info, full fine-tuning, LoRA, and prefix tuning) across three real-world datasets and thousands of configurations. The key findings show that LLM embeddings alone provide inconsistent robustness gains, but finetuning with as few as 32 target samples yields substantial improvements, especially under stronger $Y|X$-shifts, and that the effectiveness of domain information and sample allocation is dataset-dependent. Overall, the work demonstrates a practical, data-efficient path to improve tabular predictions under distribution shifts and offers theoretical insights linking LLM-based representations to reduced target-risk in domain adaptation.

Abstract

For tabular datasets, the change in the relationship between the label and covariates ($Y|X$-shifts) is common due to missing variables (a.k.a. confounders). Since it is impossible to generalize to a completely new and unknown domain, we study models that are easy to adapt to the target domain even with few labeled examples. We focus on building more informative representations of tabular data that can mitigate $Y|X$-shifts, and propose to leverage the prior world knowledge in LLMs by serializing (write down) the tabular data to encode it. We find LLM embeddings alone provide inconsistent improvements in robustness, but models trained on them can be well adapted/finetuned to the target domain even using 32 labeled observations. Our finding is based on a comprehensive and systematic study consisting of 7650 source-target pairs and benchmark against 261,000 model configurations trained by 22 algorithms. Our observation holds when ablating the size of accessible target data and different adaptation strategies. The code is available at https://github.com/namkoong-lab/LLM-Tabular-Shifts.

LLM Embeddings Improve Test-time Adaptation to Tabular $Y|X$-Shifts

TL;DR

This paper addresses

-shifts in tabular data by proposing serialization-based LLM embeddings to produce informative representations for test-time adaptation with limited target labels. It combines these embeddings with optional domain information and trains a shallow neural network, evaluating multiple target-adaptation strategies (in-context domain info, full fine-tuning, LoRA, and prefix tuning) across three real-world datasets and thousands of configurations. The key findings show that LLM embeddings alone provide inconsistent robustness gains, but finetuning with as few as 32 target samples yields substantial improvements, especially under stronger

-shifts, and that the effectiveness of domain information and sample allocation is dataset-dependent. Overall, the work demonstrates a practical, data-efficient path to improve tabular predictions under distribution shifts and offers theoretical insights linking LLM-based representations to reduced target-risk in domain adaptation.

Abstract

For tabular datasets, the change in the relationship between the label and covariates (

-shifts) is common due to missing variables (a.k.a. confounders). Since it is impossible to generalize to a completely new and unknown domain, we study models that are easy to adapt to the target domain even with few labeled examples. We focus on building more informative representations of tabular data that can mitigate

-shifts, and propose to leverage the prior world knowledge in LLMs by serializing (write down) the tabular data to encode it. We find LLM embeddings alone provide inconsistent improvements in robustness, but models trained on them can be well adapted/finetuned to the target domain even using 32 labeled observations. Our finding is based on a comprehensive and systematic study consisting of 7650 source-target pairs and benchmark against 261,000 model configurations trained by 22 algorithms. Our observation holds when ablating the size of accessible target data and different adaptation strategies. The code is available at https://github.com/namkoong-lab/LLM-Tabular-Shifts.

LLM Embeddings Improve Test-time Adaptation to Tabular $Y|X$-Shifts

TL;DR

Abstract

LLM Embeddings Improve Test-time Adaptation to Tabular $Y|X$-Shifts

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (1)