Table of Contents
Fetching ...

Show and Tell: Prompt Strategies for Style Control in Multi-Turn LLM Code Generation

Jeremiah Bohr

TL;DR

This study investigates how prompt design shapes code style in multi-turn LLM code generation, focusing on verbosity, documentation, and defensive patterns while maintaining high functional accuracy. It compares instruction-based, example-based, and combined prompts across a paired two-turn workflow and finds that combined prompts deliver the strongest initial compression and the best expansion discipline. Directive-based prompts maintain compression during enhancement, whereas example-based prompts alone fail to constrain growth. The results highlight that style control and functional correctness are separable properties of prompting, with practical implications for designing reliable AI-assisted software development workflows.

Abstract

Language models generate functionally correct code that tends toward excessive verbosity, with elaborate documentation and defensive patterns that diverge from human baselines. Two prompting mechanisms have emerged for stylistic control: instruction based prompts that articulate abstract directives, and example based prompts that provide concrete code demonstrations. The core problem is whether stylistic constraints persist when models enhance initial implementations with additional features while maintaining high functional accuracy. Here we show that instruction-based, example-based, and combined prompts produce distinct patterns of initial control and expansion discipline over one enhancement turn. We manipulated system prompts across four conditions in a paired two-turn protocol where models first generated solutions to an intermediate Python task, then revised their code under general improvement directives, holding the user task fixed (N = 160 paired programs). Combined prompts produced the strongest initial compression and greatest expansion discipline. Instructions showed large initial effects and moderate expansion discipline. Examples showed modest initial effects with no expansion discipline. These results show that initial prompt effectiveness and expansion discipline are separate aspects of prompt design, and that combined approaches provide the most stable stylistic control in this two-turn workflow.

Show and Tell: Prompt Strategies for Style Control in Multi-Turn LLM Code Generation

TL;DR

This study investigates how prompt design shapes code style in multi-turn LLM code generation, focusing on verbosity, documentation, and defensive patterns while maintaining high functional accuracy. It compares instruction-based, example-based, and combined prompts across a paired two-turn workflow and finds that combined prompts deliver the strongest initial compression and the best expansion discipline. Directive-based prompts maintain compression during enhancement, whereas example-based prompts alone fail to constrain growth. The results highlight that style control and functional correctness are separable properties of prompting, with practical implications for designing reliable AI-assisted software development workflows.

Abstract

Language models generate functionally correct code that tends toward excessive verbosity, with elaborate documentation and defensive patterns that diverge from human baselines. Two prompting mechanisms have emerged for stylistic control: instruction based prompts that articulate abstract directives, and example based prompts that provide concrete code demonstrations. The core problem is whether stylistic constraints persist when models enhance initial implementations with additional features while maintaining high functional accuracy. Here we show that instruction-based, example-based, and combined prompts produce distinct patterns of initial control and expansion discipline over one enhancement turn. We manipulated system prompts across four conditions in a paired two-turn protocol where models first generated solutions to an intermediate Python task, then revised their code under general improvement directives, holding the user task fixed (N = 160 paired programs). Combined prompts produced the strongest initial compression and greatest expansion discipline. Instructions showed large initial effects and moderate expansion discipline. Examples showed modest initial effects with no expansion discipline. These results show that initial prompt effectiveness and expansion discipline are separate aspects of prompt design, and that combined approaches provide the most stable stylistic control in this two-turn workflow.

Paper Structure

This paper contains 21 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Turn 1 Baseline Differences with Control.
  • Figure 2: Turn 1 to Turn 2 Trajectories (mean $\pm$ 95% CIs).