Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

Sungwon Han; Jinsung Yoon; Sercan O Arik; Tomas Pfister

Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

Sungwon Han, Jinsung Yoon, Sercan O Arik, Tomas Pfister

TL;DR

This work tackles data-efficient, few-shot tabular learning by leveraging Large Language Models (LLMs) as feature engineers rather than end-to-end predictors. FeatLLM prompts an LLM to extract class-specific rules, converts these rules into binary features, and uses a simple non-negative-weight linear model with Softmax to estimate class likelihoods, all while employing bagging to manage prompt-size and promote robustness. The approach requires only API access to LLMs and avoids per-sample LLM inference at inference time, yielding low practical inference cost. Across 13 tabular datasets, FeatLLM achieves top or near-top performance, outperforming strong baselines like TabLLM and STUNT by approximately 10% on average, with ablations showing the importance of tuning, ensembling, and reasoning guidance. The results highlight the potential of combining prior LLM knowledge with few-shot data to generate informative, interpretable rule-based features for efficient, scalable tabular learning.

Abstract

Large Language Models (LLMs), with their remarkable ability to tackle challenging and unseen reasoning problems, hold immense potential for tabular learning, that is vital for many real-world applications. In this paper, we propose a novel in-context learning framework, FeatLLM, which employs LLMs as feature engineers to produce an input data set that is optimally suited for tabular predictions. The generated features are used to infer class likelihood with a simple downstream machine learning model, such as linear regression and yields high performance few-shot learning. The proposed FeatLLM framework only uses this simple predictive model with the discovered features at inference time. Compared to existing LLM-based approaches, FeatLLM eliminates the need to send queries to the LLM for each sample at inference time. Moreover, it merely requires API-level access to LLMs, and overcomes prompt size limitations. As demonstrated across numerous tabular datasets from a wide range of domains, FeatLLM generates high-quality rules, significantly (10% on average) outperforming alternatives such as TabLLM and STUNT.

Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

TL;DR

Abstract

Paper Structure (46 sections, 2 equations, 12 figures, 16 tables)

This paper contains 46 sections, 2 equations, 12 figures, 16 tables.

Introduction
Related Work
Few-Shot Learning with Tabular Data
Language-Interfaced Tabular Learning
Methods
Problem formulation.
Prompt Design for Extracting Rules
Basic information description.
Reasoning instruction.
Response instruction.
Inferring Class Likelihood via Rules
Parsing rules for feature generation.
Inferring class likelihood.
Ensembling with bagging.
Experiments
...and 31 more sections

Figures (12)

Figure 1: Illustration of FeatLLM. FeatLLM extracts rules for each class, utilizing prior knowledge and few-shot examples. These rules are then parsed and applied to create binary features for data samples. A linear layer is trained on these binary features to estimate class likelihoods. This procedure is repeated multiple times for ensembling.
Figure 2: Prompt for rule extraction. Text in orange provides basic information description; blue text outlines reasoning instruction; and yellow text details response instruction.
Figure 3: Prompt for parsing rules. This prompt incorporates the rules generated in the previous stage, placed within the $<$Conditions$>$ section.
Figure 4: Performance comparison summaries among conventional tabular baselines and FeatLLM. Averaged AUC over all datasets across the number of shots are reported.
Figure 5: Visualization of performance impact from spurious correlations. The results exhibit the models' performance (AUC) each time a noisy column from the Adult dataset is added to the original Heart dataset. XGBoost is excluded here due to its lower performance.
...and 7 more figures

Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

TL;DR

Abstract

Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)