metaTextGrad: Automatically optimizing language model optimizers
Guowei Xu, Mert Yuksekgonul, Carlos Guestrin, James Zou
TL;DR
metaTextGrad introduces a meta-optimization framework to automatically tailor LLM-based optimizers to specific tasks by learning task-aligned prompts and optimal optimizer structures. It presents two components—the meta prompt optimizer and the meta structure optimizer—and demonstrates that their combination yields improvements over baselines across diverse benchmarks, with evidence of transferability across models and datasets. A theoretical bound supports the necessity of task-aligned meta-learning, while empirical results show gains in efficiency and accuracy, including cases where smaller models outperform larger zero-shot baselines. The work offers a practical path to more reliable, task-aware LLM optimization and suggests future directions in learning the meta-optimizer itself and expanding optimization parameterizations.
Abstract
Large language models (LLMs) are increasingly used in learning algorithms, evaluations, and optimization tasks. Recent studies have shown that using LLM-based optimizers to automatically optimize model prompts, demonstrations, predictions themselves, or other components can significantly enhance the performance of AI systems, as demonstrated by frameworks such as DSPy and TextGrad. However, optimizers built on language models themselves are usually designed by humans with manual design choices; optimizers themselves are not optimized. Moreover, these optimizers are general purpose by design, to be useful to a broad audience, and are not tailored for specific tasks. To address these challenges, we propose metaTextGrad, which focuses on designing a meta-optimizer to further enhance existing optimizers and align them to be good optimizers for a given task. Our approach consists of two key components: a meta prompt optimizer and a meta structure optimizer. The combination of these two significantly improves performance across multiple benchmarks, achieving an average absolute performance improvement of up to 6% compared to the best baseline.
