ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization
Yinjie Wang, Ling Yang, Guohao Li, Mengdi Wang, Bryon Aragam
TL;DR
ScoreFlow presents a gradient-based framework for automated, adaptive optimization of multi-agent LLM workflows and introduces Score-DPO, a score-aware variant of direct preference optimization. By using code-based workflow representations and an operator library, ScoreFlow achieves robust performance and cost efficiency across six benchmarks in QA, coding, and math, outperforming both manual and prior automated methods by 8.2%. The approach combines quantitative feedback with preference data to accelerate convergence and enable smaller models to surpass larger ones at lower costs. Theoretical analysis supports why score integration improves learning, and extensive ablations demonstrate adaptability across architectures and task types.
Abstract
Recent research has leveraged large language model multi-agent systems for complex problem-solving while trying to reduce the manual effort required to build them, driving the development of automated agent workflow optimization methods. However, existing methods remain inflexible due to representational limitations, a lack of adaptability, and poor scalability when relying on discrete optimization techniques. We address these challenges with ScoreFlow, a simple yet high-performance framework that leverages efficient gradient-based optimization in a continuous space. ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback. Across six benchmarks spanning question answering, coding, and mathematical reasoning, ScoreFlow achieves an 8.2% improvement over existing baselines. Moreover, it empowers smaller models to outperform larger ones with lower inference costs. Project: https://github.com/Gen-Verse/ScoreFlow
