Table of Contents
Fetching ...

Multi-Sample Prompting and Actor-Critic Prompt Optimization for Diverse Synthetic Data Generation

Abdelkarim El-Hajjami, Camille Salinesi

Abstract

High-quality labeled datasets are fundamental for training and evaluating machine learning models, yet domains such as healthcare and Requirements Engineering (RE) face persistent barriers due to data scarcity, privacy constraints, or proprietary restrictions. While Large Language Models (LLMs) offer a promising avenue for Synthetic Data Generation (SDG), LLM-generated data tends to be repetitive and low in diversity, reducing its effectiveness for downstream tasks. Two approaches show potential for addressing this limitation: (1) multi-sample prompting, which generates multiple samples per prompt to reduce repetition, and (2) Prompt with Actor-Critic Editing (PACE), which iteratively refines prompts to maximize diversity. We integrate both mechanisms into Synthline, a Feature Model-based configurable synthetic data generator, and assess their effects on diversity and downstream utility across four RE classification tasks. Multi-sample prompting consistently improves both diversity and utility, with F1-score gains of 6 to 43.8 percentage points. PACE-based prompt optimization consistently improves lexical diversity but produces task-dependent utility effects, revealing the risks of optimizing for diversity alone. Most notably, synthetic data can match or surpass human-authored data for tasks where real labeled data is limited, with improvements of up to 15.4 percentage points in F1-score.

Multi-Sample Prompting and Actor-Critic Prompt Optimization for Diverse Synthetic Data Generation

Abstract

High-quality labeled datasets are fundamental for training and evaluating machine learning models, yet domains such as healthcare and Requirements Engineering (RE) face persistent barriers due to data scarcity, privacy constraints, or proprietary restrictions. While Large Language Models (LLMs) offer a promising avenue for Synthetic Data Generation (SDG), LLM-generated data tends to be repetitive and low in diversity, reducing its effectiveness for downstream tasks. Two approaches show potential for addressing this limitation: (1) multi-sample prompting, which generates multiple samples per prompt to reduce repetition, and (2) Prompt with Actor-Critic Editing (PACE), which iteratively refines prompts to maximize diversity. We integrate both mechanisms into Synthline, a Feature Model-based configurable synthetic data generator, and assess their effects on diversity and downstream utility across four RE classification tasks. Multi-sample prompting consistently improves both diversity and utility, with F1-score gains of 6 to 43.8 percentage points. PACE-based prompt optimization consistently improves lexical diversity but produces task-dependent utility effects, revealing the risks of optimizing for diversity alone. Most notably, synthetic data can match or surpass human-authored data for tasks where real labeled data is limited, with improvements of up to 15.4 percentage points in F1-score.

Paper Structure

This paper contains 28 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: The PACE optimization process integrated into Synthline. The Actor LLM generates candidate samples from the current prompt; a scoring function evaluates the batch; the Critic LLM provides natural-language feedback; and the Actor produces a revised prompt. After the final iteration, the best-scoring prompt is forwarded to the main generation phase.
  • Figure 2: Synthline architecture with conditional PACE optimization branch.