Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT

Bhavya Bhavya; Jinjun Xiong; Chengxiang Zhai

Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT

Bhavya Bhavya, Jinjun Xiong, Chengxiang Zhai

TL;DR

This paper investigates analogy generation by prompting large language models for two tasks: Analogous Concept Generation (ACG) and Analogy Explanation Generation (AEG). It shows that InstructGPT can produce meaningful analogies when prompts are precise and temperature is low, with performance strongly influenced by model size. The authors introduce new datasets and a thorough robustness analysis across prompts, temperatures, and spelling perturbations, plus a human evaluation that reveals larger models approach or reach human performance for ACG but still struggle with AEG. The work demonstrates the promise and limitations of prompt-driven analogy generation and outlines future directions for improving cross-domain generalization and evaluative methods.

Abstract

We propose a novel application of prompting Pre-trained Language Models (PLMs) to generate analogies and study how to design effective prompts for two task settings: generating a source concept analogous to a given target concept (aka Analogous Concept Generation or ACG), and generating an explanation of the similarity between a given pair of target concept and source concept (aka Analogous Explanation Generation or AEG). We found that it is feasible to prompt InstructGPT to generate meaningful analogies and the best prompts tend to be precise imperative statements especially with a low temperature setting. We also systematically analyzed the sensitivity of the InstructGPT model to prompt design, temperature, and injected spelling errors, and found that the model is particularly sensitive to certain variations (e.g., questions vs. imperative statements). Further, we conducted human evaluation on 1.4k of the generated analogies and found that the quality of generations varies substantially by model size. The largest InstructGPT model can achieve human-level performance at generating meaningful analogies for a given target while there is still room for improvement on the AEG task.

Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT

TL;DR

Abstract

Paper Structure (27 sections, 6 figures, 20 tables)

This paper contains 27 sections, 6 figures, 20 tables.

Introduction
Related Work
Computational Models of Analogies
Prompting Language Models
Problem Formulation
Experiment Setup
Experiment Results
Feasibility Analysis
Robustness analyses
Analysis of prompts
Analysis of temperature
Analysis of synthetic spelling errors
Analysis of model size
Human evaluation
Annotation Setup
...and 12 more sections

Figures (6)

Figure 1: Kendall's Tau correlation between bleurt scores of various prompts and temperatures in wsrc
Figure 2: Average performances of various InstructGPT models based on bleurt scores.
Figure 3: Kendall's Tau correlation between BLEURT scores of various prompts and temperatures in no_src
Figure 4: Pre-screening question for identifying qualified workers.
Figure 5: Sample interface for screening qualified workers.
...and 1 more figures

Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT

TL;DR

Abstract

Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT

Authors

TL;DR

Abstract

Table of Contents

Figures (6)