Table of Contents
Fetching ...

MAPRO: Recasting Multi-Agent Prompt Optimization as Maximum a Posteriori Inference

Zheyuan Zhang, Lin Ge, Hongjiang Li, Weicheng Zhu, Chuxu Zhang, Yanfang Ye

TL;DR

MAPRO reframes multi-agent prompt optimization as a Maximum a Posteriori (MAP) inference problem over a directed acyclic graph of agents, enabling principled joint optimization of agent prompts. It solves the resulting combinatorial problem with a language-guided max-product belief propagation algorithm and augments it with topology-aware credit assignment to propagate downstream feedback to upstream prompts. The approach achieves state-of-the-art performance across mathematical reasoning, question answering, and code generation benchmarks, surpassing manually engineered baselines and prior automated methods. Beyond empirical gains, MAPRO offers general design guidelines for building more reliable and principled multi-agent systems in practice, by explicitly modeling prompt interactions and credit signals within the MAS topology.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, and LLM-based agents further extend these abilities to various practical workflows. While recent progress shows that multi-agent systems (MAS) can outperform single agents by coordinating specialized roles, designing effective MAS remains difficult due to prompt sensitivity and the compounded instability MAS creates. To cope with the challenge, recent efforts in automated prompt design have reduced manual effort. However, multi-agent prompt optimization remains largely unexplored. Challenges like exponentially expanding search space and ambiguous credit assignment together make systematic design intractable without principled methods. Therefore, we introduce M}ulti-Agent PRompt Optimization (MAPRO), a four-stage framework that first formulates MAS prompt optimization as a Maximum a Posteriori (MAP) inference problem and solves it using a language-guided variant of max-product belief propagation algorithm. To address credit assignment and updates the system iteratively, MAPRO employs a topology-aware refinement mechanism that integrates execution feedback and downstream blames to selectively update agent prompts. Through this process, MAPRO progressively converges to a coordinated set of agent-specific prompt policies. Across benchmarks in various tasks, MAPRO achieves state-of-the-art performance, consistently surpassing manually engineered baselines and recent automated alternatives. Beyond performance, our MAP-based formulation also delivers general guidelines for building more reliable and principled multi-agent systems in the future

MAPRO: Recasting Multi-Agent Prompt Optimization as Maximum a Posteriori Inference

TL;DR

MAPRO reframes multi-agent prompt optimization as a Maximum a Posteriori (MAP) inference problem over a directed acyclic graph of agents, enabling principled joint optimization of agent prompts. It solves the resulting combinatorial problem with a language-guided max-product belief propagation algorithm and augments it with topology-aware credit assignment to propagate downstream feedback to upstream prompts. The approach achieves state-of-the-art performance across mathematical reasoning, question answering, and code generation benchmarks, surpassing manually engineered baselines and prior automated methods. Beyond empirical gains, MAPRO offers general design guidelines for building more reliable and principled multi-agent systems in practice, by explicitly modeling prompt interactions and credit signals within the MAS topology.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, and LLM-based agents further extend these abilities to various practical workflows. While recent progress shows that multi-agent systems (MAS) can outperform single agents by coordinating specialized roles, designing effective MAS remains difficult due to prompt sensitivity and the compounded instability MAS creates. To cope with the challenge, recent efforts in automated prompt design have reduced manual effort. However, multi-agent prompt optimization remains largely unexplored. Challenges like exponentially expanding search space and ambiguous credit assignment together make systematic design intractable without principled methods. Therefore, we introduce M}ulti-Agent PRompt Optimization (MAPRO), a four-stage framework that first formulates MAS prompt optimization as a Maximum a Posteriori (MAP) inference problem and solves it using a language-guided variant of max-product belief propagation algorithm. To address credit assignment and updates the system iteratively, MAPRO employs a topology-aware refinement mechanism that integrates execution feedback and downstream blames to selectively update agent prompts. Through this process, MAPRO progressively converges to a coordinated set of agent-specific prompt policies. Across benchmarks in various tasks, MAPRO achieves state-of-the-art performance, consistently surpassing manually engineered baselines and recent automated alternatives. Beyond performance, our MAP-based formulation also delivers general guidelines for building more reliable and principled multi-agent systems in the future

Paper Structure

This paper contains 34 sections, 22 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Prompt quality governs agent reliability: (top-left) vague, manually revised prompts are error-prone and costly; (top-right) automated prompt optimization search and produce correct answers; (bottom-left) the optimizer explores and selects among candidate rewrites; (bottom-right) performance improves over iterations, surpassing hand-tuned prompting.
  • Figure 2: The Overall Framework of MAPRO. Specifically, a) shows the existing methods of prompt optimization for MAS and their drawbacks; b) shows the overall framework of MAPRO compared with existing methods; and c) demonstrate the detailed modules used in MAPRO.
  • Figure 3: Optimization trajectories on the MBPP+ benchmark. We report the first ten optimization iterations using the chain MAS framework. MAPRO exhibits a more consistent and steady improvement compared to alternative methods.
  • Figure 4: Unified Reward Modeling Prompts for MAPRO: node-level (left) and edge-level (right), merging each module’s header and reward prefix verbatim.
  • Figure 5: Feedback system prompts in MAPRO (for coding tasks): global feedback, local feedback, and mutation strategy.
  • ...and 3 more figures