Exploring Prompt Engineering Practices in the Enterprise

Michael Desmond; Michelle Brachman

Exploring Prompt Engineering Practices in the Enterprise

Michael Desmond, Michelle Brachman

TL;DR

It is hypothesized that the way in which users iterate on their prompts can provide insight into how they think prompting and models work, as well as the kinds of support needed for more efficient prompt engineering.

Abstract

Interaction with Large Language Models (LLMs) is primarily carried out via prompting. A prompt is a natural language instruction designed to elicit certain behaviour or output from a model. In theory, natural language prompts enable non-experts to interact with and leverage LLMs. However, for complex tasks and tasks with specific requirements, prompt design is not trivial. Creating effective prompts requires skill and knowledge, as well as significant iteration in order to determine model behavior, and guide the model to accomplish a particular goal. We hypothesize that the way in which users iterate on their prompts can provide insight into how they think prompting and models work, as well as the kinds of support needed for more efficient prompt engineering. To better understand prompt engineering practices, we analyzed sessions of prompt editing behavior, categorizing the parts of prompts users iterated on and the types of changes they made. We discuss design implications and future directions based on these prompt engineering practices.

Exploring Prompt Engineering Practices in the Enterprise

TL;DR

Abstract

Paper Structure (22 sections, 6 figures, 2 tables)

This paper contains 22 sections, 6 figures, 2 tables.

Introduction
Related Work
Prompting Practices
Mental Models and Repair
Methods
Data Collection
Qualitative Analysis
Results
Prompt Engineering Sessions: High-level Editing Analysis
Prompt Editing Practices
Use cases
Frequency of Prompt Component and Edit Types
Multiple Edits
Rollbacks
Context
...and 7 more sections

Figures (6)

Figure 1: Duration of observed prompt editing sessions in minutes.
Figure 2: The size of change between successive prompts, represented as a similarity ratio. Values closer to 1 indicate similarity between successive prompts, 1 being an exact match, while a ratio of 0 indicates nothing in common.
Figure 3: The occurrence of parameter changes as a percentage of sessions in which the change was observed. Users primarily changed the target language model, the maximum number of tokens to generate, and repetition penalty. Stop sequence, temperature and decoding method were also commonly changed.
Figure 4: The number of models used per session.
Figure 5: Number of edits that focused on each of the prompt components. Users primarily edited context, and task instructions to a lesser extent.
...and 1 more figures

Exploring Prompt Engineering Practices in the Enterprise

TL;DR

Abstract

Exploring Prompt Engineering Practices in the Enterprise

Authors

TL;DR

Abstract

Table of Contents

Figures (6)