An Automatic Prompt Generation System for Tabular Data Tasks

Ashlesha Akella; Abhijit Manatkar; Brij Chavda; Hima Patel

An Automatic Prompt Generation System for Tabular Data Tasks

Ashlesha Akella, Abhijit Manatkar, Brij Chavda, Hima Patel

TL;DR

This paper presents an innovative auto-prompt generation system suitable for multiple LLMs, with minimal training, and proposes two novel methods; a Reinforcement Learning-based algorithm for identifying and sequencing task-relevant columns and a Cell-level similarity-based approach for enhancing few-shot example selection.

Abstract

Efficient processing of tabular data is important in various industries, especially when working with datasets containing a large number of columns. Large language models (LLMs) have demonstrated their ability on several tasks through carefully crafted prompts. However, creating effective prompts for tabular datasets is challenging due to the structured nature of the data and the need to manage numerous columns. This paper presents an innovative auto-prompt generation system suitable for multiple LLMs, with minimal training. It proposes two novel methods; 1) A Reinforcement Learning-based algorithm for identifying and sequencing task-relevant columns 2) Cell-level similarity-based approach for enhancing few-shot example selection. Our approach has been extensively tested across 66 datasets, demonstrating improved performance in three downstream tasks: data imputation, error detection, and entity matching using two distinct LLMs; Google flan-t5-xxl and Mixtral 8x7B.

An Automatic Prompt Generation System for Tabular Data Tasks

TL;DR

Abstract

Paper Structure (19 sections, 9 equations, 4 figures, 6 tables)

This paper contains 19 sections, 9 equations, 4 figures, 6 tables.

Introduction
Related Work
Motivation
Method and Implementation
Architecture
Reinforcement Learning based Column Selection: RLCS
State and Action Representation
Policy Network
Cell-Level Similarity Measure based Few-shot Selection: CLFS
Prompt template
Datasets
Experimental Results
Conclusion and Future Work
Appendix
Reinforcement Learning Parameters
...and 4 more sections

Figures (4)

Figure 1: Example Prompt Template for Data Imputation task
Figure 2: Variations in accuracy across different combinations and permutations for manually selected columns for Data Imputation (DI) and Error detection (ED). We collected accuracies for all possible permutations of the selected columns (per dataset and per task) and visualized the distributions of accuracies.
Figure 3: The architecture comprises three modules: RL agent Training Module for Column Selection, Build Prompt Module and Evaluation.
Figure 4: The plot shows, reward accumulated by the RL-agent while undergoing training for each episode. The solid lines represent the average, and the shaded areas depict the highest and lowest test accuracy across 3 different seeds.

An Automatic Prompt Generation System for Tabular Data Tasks

TL;DR

Abstract

An Automatic Prompt Generation System for Tabular Data Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (4)