C-MOP: Integrating Momentum and Boundary-Aware Clustering for Enhanced Prompt Evolution
Binwei Yan, Yifei Fu, Mingjian Zhu, Hanting Chen, Mingxuan Yuan, Yunhe Wang, Hailin Hu
TL;DR
The paper addresses the instability and noise in automatic prompt optimization for large language models by introducing C-MOP, a framework that stabilizes textual-gradient updates through Boundary-Aware Contrastive Sampling (BACS) and Momentum-Guided Semantic Clustering (MGSC).C-MOP employs a four-stage pipeline—Full-Batch Prediction, BACS, MGSC, and Gradient-Guided Evolution—with batch-level error signals, tripartite sampling (Hard Negatives, Anchors, Boundary Pairs), and a decaying momentum-based gradient pool to produce coherent optimization directions.Empirical results on BBH, GSM8K, Liar, and CFinBench demonstrate state-of-the-art performance and notable gains over baselines such as PromptWizard, with a 3B general model able to surpass some 70B domain-specific dense LLMs, highlighting the practical impact of the approach.Ablation studies confirm the complementary roles of BACS and MGSC and show that optimizer capacity significantly influences performance, indicating that higher-quality textual gradients enable more effective prompt evolution.The work suggests that robust, sample-aware, and momentum-informed prompt optimization can bridge the gap between general-purpose LLMs and domain-specific models, and it provides code for reproducibility.
Abstract
Automatic prompt optimization is a promising direction to boost the performance of Large Language Models (LLMs). However, existing methods often suffer from noisy and conflicting update signals. In this research, we propose C-MOP (Cluster-based Momentum Optimized Prompting), a framework that stabilizes optimization via Boundary-Aware Contrastive Sampling (BACS) and Momentum-Guided Semantic Clustering (MGSC). Specifically, BACS utilizes batch-level information to mine tripartite features--Hard Negatives, Anchors, and Boundary Pairs--to precisely characterize the typical representation and decision boundaries of positive and negative prompt samples. To resolve semantic conflicts, MGSC introduces a textual momentum mechanism with temporal decay that distills persistent consensus from fluctuating gradients across iterations. Extensive experiments demonstrate that C-MOP consistently outperforms SOTA baselines like PromptWizard and ProTeGi, yielding average gains of 1.58% and 3.35%. Notably, C-MOP enables a general LLM with 3B activated parameters to surpass a 70B domain-specific dense LLM, highlighting its effectiveness in driving precise prompt evolution. The code is available at https://github.com/huawei-noah/noah-research/tree/master/C-MOP.
