Table of Contents
Fetching ...

The Effect of Education in Prompt Engineering: Evidence from Journalists

Amirsiavosh Bashardoust, Yuanjun Feng, Dominique Geissler, Stefan Feuerriegel, Yash Raj Shrestha

TL;DR

This study investigates whether prompt engineering training improves professional users' interactions with large language models and the quality of their outputs. Using a counter-balanced field experiment with journalists, the authors assess effects on user experience, expert-rated accuracy, and non-expert reader perceptions across two scientific articles. The findings show that training raises perceived expertise but can lower perceived helpfulness, while accuracy outcomes are task-dependent and reader perceptions yield nuanced, largely non-significant effects. The work highlights the importance of task- and audience-specific AI literacy programs and suggests that effective prompt engineering requires careful alignment with professional standards and content complexity.

Abstract

Large language models (LLMs) are increasingly used in daily work. In this paper, we analyze whether training in prompt engineering can improve the interactions of users with LLMs. For this, we conducted a field experiment where we asked journalists to write short texts before and after training in prompt engineering. We then analyzed the effect of training on three dimensions: (1) the user experience of journalists when interacting with LLMs, (2) the accuracy of the texts (assessed by a domain expert), and (3) the reader perception, such as clarity, engagement, and other text quality dimensions (assessed by non-expert readers). Our results show: (1) Our training improved the perceived expertise of journalists but also decreased the perceived helpfulness of LLM use. (2) The effect on accuracy varied by the difficulty of the task. (3) There is a mixed impact of training on reader perception across different text quality dimensions.

The Effect of Education in Prompt Engineering: Evidence from Journalists

TL;DR

This study investigates whether prompt engineering training improves professional users' interactions with large language models and the quality of their outputs. Using a counter-balanced field experiment with journalists, the authors assess effects on user experience, expert-rated accuracy, and non-expert reader perceptions across two scientific articles. The findings show that training raises perceived expertise but can lower perceived helpfulness, while accuracy outcomes are task-dependent and reader perceptions yield nuanced, largely non-significant effects. The work highlights the importance of task- and audience-specific AI literacy programs and suggests that effective prompt engineering requires careful alignment with professional standards and content complexity.

Abstract

Large language models (LLMs) are increasingly used in daily work. In this paper, we analyze whether training in prompt engineering can improve the interactions of users with LLMs. For this, we conducted a field experiment where we asked journalists to write short texts before and after training in prompt engineering. We then analyzed the effect of training on three dimensions: (1) the user experience of journalists when interacting with LLMs, (2) the accuracy of the texts (assessed by a domain expert), and (3) the reader perception, such as clarity, engagement, and other text quality dimensions (assessed by non-expert readers). Our results show: (1) Our training improved the perceived expertise of journalists but also decreased the perceived helpfulness of LLM use. (2) The effect on accuracy varied by the difficulty of the task. (3) There is a mixed impact of training on reader perception across different text quality dimensions.
Paper Structure (36 sections, 2 equations, 8 figures, 5 tables)

This paper contains 36 sections, 2 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Overview of our study process.
  • Figure 2: Procedure of field experiment.
  • Figure 3: Participants' perceived expertise in using LLMs before and after training (N=29).
  • Figure 4: Participants' perceived helpfulness of LLMs before and after training (N=29).
  • Figure 5: The overall score measuring accuracy (as assessed by the domain expert). Whiskers refer to standard errors.
  • ...and 3 more figures