Table of Contents
Fetching ...

MAPGD: Multi-Agent Prompt Gradient Descent for Collaborative Prompt Optimization

Yichen Han, Yuhang Han, Bojun Liu, Zhengpeng Zhou, Guanyu Liu, Zeng Zhang, Yang Yang, Wenli Wang, Isaac N Shi, Yunyan Zhang, Lewei He, Tianyu Shi

TL;DR

MAPGD reframes prompt optimization as a collaborative gradient-descent process among specialized agents, introducing Hypersphere-Constrained Gradient Clustering (HCGC) to enforce angular margins and Channel-Adaptive Agent Weighting (CAAW) to modulate agent influence. The framework enables semantic gradient embedding, conflict detection, and fusion, yielding robust, efficient prompt refinement with a sublinear convergence rate of $O\left(1/\sqrt{T}\right)$. Empirically, MAPGD outperforms single-agent baselines like ProTeGi on four classification tasks and transfers effectively to arithmetic reasoning benchmarks, while reducing token usage by about $8\%$. These results demonstrate that geometry-aware collaboration among diverse prompt dimensions can achieve reliable improvements in both discriminative and reasoning tasks.

Abstract

Prompt engineering is crucial for fully leveraging large language models (LLMs), yet most existing optimization methods follow a single trajectory, resulting in limited adaptability, gradient conflicts, and high computational overhead. We propose MAPGD (Multi-Agent Prompt Gradient Descent), a novel framework that reconceptualizes prompt optimization as a collaborative process among specialized agents. Each agent focuses on a distinct refinement dimension, such as instruction clarity, example selection, format structure, or stylistic adaptation, and their contributions are coordinated through semantic gradient embedding, conflict detection, and fusion. To further enhance robustness and stability, MAPGD introduces two new mechanisms: Hypersphere Constrained Gradient Clustering (HCGC), which enforces angular margin constraints for compact and well-separated clusters, and Channel Adaptive Agent Weighting (CAAW), which dynamically reweights agent contributions based on validation performance. Experiments on classification and reasoning benchmarks show that MAPGD consistently surpasses single-agent and random baselines in both accuracy and efficiency. Ablation studies confirm the effectiveness of gradient fusion, agent specialization, and conflict resolution. Together, these components establish MAPGD as a unified, gradient-based, and interpretable framework for robust prompt optimization with theoretical convergence guarantees.

MAPGD: Multi-Agent Prompt Gradient Descent for Collaborative Prompt Optimization

TL;DR

MAPGD reframes prompt optimization as a collaborative gradient-descent process among specialized agents, introducing Hypersphere-Constrained Gradient Clustering (HCGC) to enforce angular margins and Channel-Adaptive Agent Weighting (CAAW) to modulate agent influence. The framework enables semantic gradient embedding, conflict detection, and fusion, yielding robust, efficient prompt refinement with a sublinear convergence rate of . Empirically, MAPGD outperforms single-agent baselines like ProTeGi on four classification tasks and transfers effectively to arithmetic reasoning benchmarks, while reducing token usage by about . These results demonstrate that geometry-aware collaboration among diverse prompt dimensions can achieve reliable improvements in both discriminative and reasoning tasks.

Abstract

Prompt engineering is crucial for fully leveraging large language models (LLMs), yet most existing optimization methods follow a single trajectory, resulting in limited adaptability, gradient conflicts, and high computational overhead. We propose MAPGD (Multi-Agent Prompt Gradient Descent), a novel framework that reconceptualizes prompt optimization as a collaborative process among specialized agents. Each agent focuses on a distinct refinement dimension, such as instruction clarity, example selection, format structure, or stylistic adaptation, and their contributions are coordinated through semantic gradient embedding, conflict detection, and fusion. To further enhance robustness and stability, MAPGD introduces two new mechanisms: Hypersphere Constrained Gradient Clustering (HCGC), which enforces angular margin constraints for compact and well-separated clusters, and Channel Adaptive Agent Weighting (CAAW), which dynamically reweights agent contributions based on validation performance. Experiments on classification and reasoning benchmarks show that MAPGD consistently surpasses single-agent and random baselines in both accuracy and efficiency. Ablation studies confirm the effectiveness of gradient fusion, agent specialization, and conflict resolution. Together, these components establish MAPGD as a unified, gradient-based, and interpretable framework for robust prompt optimization with theoretical convergence guarantees.

Paper Structure

This paper contains 58 sections, 21 equations, 6 figures, 4 tables, 5 algorithms.

Figures (6)

  • Figure 1: Overview of the Multi-Agent Prompt Gradient Descent for Collaborative Prompt Optimization.
  • Figure 2: Workflow of MAPGD. Starting from an initial prompt $p$, specialized agents propose diverse pseudo-gradients $\{g,g',g",g"',\ldots\}$ (top). These gradients are semantically embedded and projected onto the unit hypersphere, then clustered with angular-margin constraints to form coherent groups (middle). Clustered directions are fused via LLM-driven synthesis into composite gradients $G$, which generate refined candidate prompts $\{P_1,P_2,\ldots,P_n\}$ (bottom). The best candidates are iteratively selected and fed back to continue the prompt optimization loop.
  • Figure 3: Details of Hypersphere-Constrained Gradient Clustering
  • Figure 4: Illustration of agent-generated pseudo-gradients in a sentiment classification task. Blue gradients form a coherent cluster under the Instruction Specialist, while the red gradient represents a conflicting update. Green gradients from the Example Curator constitute another semantically compact cluster. This motivates the need for hypersphere-constrained clustering to enforce intra-cluster compactness and inter-cluster separation.
  • Figure 5: Test performance (F1 score) vs. API query budget per prompt candidate across four benchmark tasks.
  • ...and 1 more figures