Table of Contents
Fetching ...

What Should We Engineer in Prompts? Training Humans in Requirement-Driven LLM Use

Qianou Ma, Weirui Peng, Chenyang Yang, Hua Shen, Kenneth Koedinger, Tongshuang Wu

TL;DR

This work argues that the core challenge in enabling end-user programming with LLMs lies in articulating precise requirements. It introduces Requirement-Oriented Prompt Engineering (ROPE) and a deliberate-practice training/assessment suite that provides real-time, requirement-focused feedback to novices. In a randomized study, ROPE markedly improved prompt quality and LLM outputs compared to conventional prompt engineering training, with strong correlations between input requirements and resulting outputs. The findings suggest that while optimizers can help, explicit human requirements remain central and that ROPE generalizes to more capable reasoning LLMs, offering a path toward broader, reliable end-user tooling for LLM-driven applications.

Abstract

Prompting LLMs for complex tasks (e.g., building a trip advisor chatbot) needs humans to clearly articulate customized requirements (e.g., "start the response with a tl;dr"). However, existing prompt engineering instructions often lack focused training on requirement articulation and instead tend to emphasize increasingly automatable strategies (e.g., tricks like adding role-plays and "think step-by-step"). To address the gap, we introduce Requirement-Oriented Prompt Engineering (ROPE), a paradigm that focuses human attention on generating clear, complete requirements during prompting. We implement ROPE through an assessment and training suite that provides deliberate practice with LLM-generated feedback. In a randomized controlled experiment with 30 novices, ROPE significantly outperforms conventional prompt engineering training (20% vs. 1% gains), a gap that automatic prompt optimization cannot close. Furthermore, we demonstrate a direct correlation between the quality of input requirements and LLM outputs. Our work paves the way to empower more end-users to build complex LLM applications.

What Should We Engineer in Prompts? Training Humans in Requirement-Driven LLM Use

TL;DR

This work argues that the core challenge in enabling end-user programming with LLMs lies in articulating precise requirements. It introduces Requirement-Oriented Prompt Engineering (ROPE) and a deliberate-practice training/assessment suite that provides real-time, requirement-focused feedback to novices. In a randomized study, ROPE markedly improved prompt quality and LLM outputs compared to conventional prompt engineering training, with strong correlations between input requirements and resulting outputs. The findings suggest that while optimizers can help, explicit human requirements remain central and that ROPE generalizes to more capable reasoning LLMs, offering a path toward broader, reliable end-user tooling for LLM-driven applications.

Abstract

Prompting LLMs for complex tasks (e.g., building a trip advisor chatbot) needs humans to clearly articulate customized requirements (e.g., "start the response with a tl;dr"). However, existing prompt engineering instructions often lack focused training on requirement articulation and instead tend to emphasize increasingly automatable strategies (e.g., tricks like adding role-plays and "think step-by-step"). To address the gap, we introduce Requirement-Oriented Prompt Engineering (ROPE), a paradigm that focuses human attention on generating clear, complete requirements during prompting. We implement ROPE through an assessment and training suite that provides deliberate practice with LLM-generated feedback. In a randomized controlled experiment with 30 novices, ROPE significantly outperforms conventional prompt engineering training (20% vs. 1% gains), a gap that automatic prompt optimization cannot close. Furthermore, we demonstrate a direct correlation between the quality of input requirements and LLM outputs. Our work paves the way to empower more end-users to build complex LLM applications.
Paper Structure (41 sections, 5 figures, 3 tables)

This paper contains 41 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Automatically refine a user's prompt for Trip Advisor using the optimizer Prompt Maker. Customized requirements still need to come from the human prompt, while the optimizer improves on non-requirement prompting aspects such as fluency, role plays, structures, etc.
  • Figure 2: Our envisioned ROPE paradigm.
  • Figure 3: Our ROPE training interface, with three types of feedback on requirement defects: (A) Chat-based hints on incomplete or inaccurate requirements, (B) Reference requirement examples to reinforce appropriately identified and expressed requirements, and (C) LLM output counterfactual to support user's reflection on incorrect or ambiguous requirements.
  • Figure 4: The pre-test and post-test overall scores between PE and ROPE conditions. (a) Overall scores for ROPE significantly improved from pre- to post-test; (b) ROPE achieved significantly higher learning gains (post $-$ pre) than PE (*** denotes $p < 0. 001$).
  • Figure 5: The pre-test and post-test overall scores between PE and ROPE conditions and the original and optimized prompt versions. ROPE's learning gain remains significantly higher than PE's optimized gains (** denotes $p \leq 0. 01$).