Explaining Patterns in Data with Language Models via Interpretable Autoprompting

Chandan Singh; John X. Morris; Jyoti Aneja; Alexander M. Rush; Jianfeng Gao

Explaining Patterns in Data with Language Models via Interpretable Autoprompting

Chandan Singh, John X. Morris, Jyoti Aneja, Alexander M. Rush, Jianfeng Gao

TL;DR

This work tackles the problem of explaining data patterns by leveraging a fixed pretrained language model to generate natural-language dataset descriptions. The proposed interpretable autoprompting (iPrompt) algorithm iteratively proposes, ranks, and regenerates fluent prompts, optimizing for how well a prompt explains the data when used as a task prompt for an LLM. Across diverse datasets, iPrompt recovers ground-truth descriptions with higher interpretability and generalizes prompts across models, sometimes matching or surpassing human-written prompts in sentiment tasks. The approach also demonstrates potential for scientific discovery by extracting meaningful explanations from datasets in toxicology, proteomics, and neuroimaging (fMRI).

Abstract

Large language models (LLMs) have displayed an impressive ability to harness natural language to perform complex tasks. In this work, we explore whether we can leverage this learned ability to find and explain patterns in data. Specifically, given a pre-trained LLM and data examples, we introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explaining the data. iPrompt iteratively alternates between generating explanations with an LLM and reranking them based on their performance when used as a prompt. Experiments on a wide range of datasets, from synthetic mathematics to natural-language understanding, show that iPrompt can yield meaningful insights by accurately finding groundtruth dataset descriptions. Moreover, the prompts produced by iPrompt are simultaneously human-interpretable and highly effective for generalization: on real-world sentiment classification datasets, iPrompt produces prompts that match or even improve upon human-written prompts for GPT-3. Finally, experiments with an fMRI dataset show the potential for iPrompt to aid in scientific discovery. All code for using the methods and data here is made available on Github.

Explaining Patterns in Data with Language Models via Interpretable Autoprompting

TL;DR

Abstract

Paper Structure (35 sections, 1 equation, 13 figures, 16 tables)

This paper contains 35 sections, 1 equation, 13 figures, 16 tables.

Introduction
Related work
Prompting and autoprompting.
Problems related to dataset explanation
Alternative methods for neural-network interpretation
Methods: Defining the task and approach
Task: Dataset Explanation
Datasets
Approach: iPrompt
Baseline: AutoPrompt
Baseline: Zero-shot suffix decoding
Proposed method: iPrompt
Experimental Setup
Language Models
Evaluation metrics
...and 20 more sections

Figures (13)

Figure 1: Interpretable autoprompting (iPrompt) inverts the standard prediction problem to instead find a natural language explanation of the data using a fixed, pre-trained large language model.
Figure 2: Prompt-based reranking depends on model size. Large models (GPT-J 6B and GPT-3) align prompts correctly to tasks. The model is given the prompt Return the of the inputs., where is filled in with the shown prompt keyword before querying the output given two inputs numbers in a string. Darker indicates a higher accuracy, and high accuracy along the diagonal indicates that the correct prompt induces the highest accuracy.
Figure 3: Overview of iPrompt. iPrompt first proposes candidate prompts, then ranks them based on their performance as a prompt, then truncates and regenerates them. This entire process is repeated until performance stops improving.
Figure 4: Comparison of model accuracy with correct prompt and iPrompt ability to find the correct prompt across each individual task (single-task MRR). Prompt recovery ability is dependent on the model's ability to perform the task.
Figure 5: Representations of the iPrompt-elicited concepts material (blue) and color (red) across the surface of the neocortex are spatially clustered and smooth. Only the top 10,000 best-predicted voxels are shown, remaining voxels are shown in black. Only the right hemisphere is shown (see both hemispheres, which show consistent smoothness in \ref{['fig:flatmap_supp']}).
...and 8 more figures

Explaining Patterns in Data with Language Models via Interpretable Autoprompting

TL;DR

Abstract

Explaining Patterns in Data with Language Models via Interpretable Autoprompting

Authors

TL;DR

Abstract

Table of Contents

Figures (13)