Table of Contents
Fetching ...

SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback

Yaoning Yu, Ye Yu, Peiyan Zhang, Kai Wei, Haojing Luo, Haohan Wang

TL;DR

SIPDO introduces a closed-loop, data-centric approach to prompt optimization by jointly training a Synthetic Data Generator and an Auto Prompt Optimizer to identify and fix prompt weaknesses through progressively harder synthetic examples. The framework reframes prompt tuning as an adaptive curriculum that stress-tests prompts and guides iterative rewrites, supported by a theoretical guarantee on worst-case error under regularised data generation. Empirically, SIPDO yields robust improvements across diverse reasoning benchmarks (e.g., BIG-Bench, FOLIO, PrOntoQA, ProofWriter, MMLU) and maintains competitiveness on challenging tasks like ProofWriter, outperforming several leading baselines. The work demonstrates the practical value of integrating synthetic data generation with prompt refinement to enhance robustness and reliability of LLM-driven systems, with potential for domain-specific extensions and fully automated continual learning.

Abstract

Prompt quality plays a critical role in the performance of large language models (LLMs), motivating a growing body of work on prompt optimization. Most existing methods optimize prompts over a fixed dataset, assuming static input distributions and offering limited support for iterative improvement. We introduce SIPDO (Self-Improving Prompts through Data-Augmented Optimization), a closed-loop framework for prompt learning that integrates synthetic data generation into the optimization process. SIPDO couples a synthetic data generator with a prompt optimizer, where the generator produces new examples that reveal current prompt weaknesses and the optimizer incrementally refines the prompt in response. This feedback-driven loop enables systematic improvement of prompt performance without assuming access to external supervision or new tasks. Experiments across question answering and reasoning benchmarks show that SIPDO outperforms standard prompt tuning methods, highlighting the value of integrating data synthesis into prompt learning workflows.

SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback

TL;DR

SIPDO introduces a closed-loop, data-centric approach to prompt optimization by jointly training a Synthetic Data Generator and an Auto Prompt Optimizer to identify and fix prompt weaknesses through progressively harder synthetic examples. The framework reframes prompt tuning as an adaptive curriculum that stress-tests prompts and guides iterative rewrites, supported by a theoretical guarantee on worst-case error under regularised data generation. Empirically, SIPDO yields robust improvements across diverse reasoning benchmarks (e.g., BIG-Bench, FOLIO, PrOntoQA, ProofWriter, MMLU) and maintains competitiveness on challenging tasks like ProofWriter, outperforming several leading baselines. The work demonstrates the practical value of integrating synthetic data generation with prompt refinement to enhance robustness and reliability of LLM-driven systems, with potential for domain-specific extensions and fully automated continual learning.

Abstract

Prompt quality plays a critical role in the performance of large language models (LLMs), motivating a growing body of work on prompt optimization. Most existing methods optimize prompts over a fixed dataset, assuming static input distributions and offering limited support for iterative improvement. We introduce SIPDO (Self-Improving Prompts through Data-Augmented Optimization), a closed-loop framework for prompt learning that integrates synthetic data generation into the optimization process. SIPDO couples a synthetic data generator with a prompt optimizer, where the generator produces new examples that reveal current prompt weaknesses and the optimizer incrementally refines the prompt in response. This feedback-driven loop enables systematic improvement of prompt performance without assuming access to external supervision or new tasks. Experiments across question answering and reasoning benchmarks show that SIPDO outperforms standard prompt tuning methods, highlighting the value of integrating data synthesis into prompt learning workflows.

Paper Structure

This paper contains 25 sections, 12 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Starting from true data distribution $S$, the Data Generator(left) produces a synthetic question-answer pair at difficulty level $c$. The Auto Prompt Optimizer(right) evaluates the current prompt on this synthetic data via three sub-modules-error analysis, recommendation, and refinement-and outputs a revised prompt. The revised prompt is tested on present failures and all previously solved examples. If the prompt still makes errors, then return to the Auto Prompt Optimizer for further refinement; if passes, move on to the next sample(with higher $c$). The cycle repeats until no error remains or the budget is reached, yielding a self-improved prompt.
  • Figure 2: Generated BIG-Bench Causal Judgment task in different difficulties