Table of Contents
Fetching ...

MOPrompt: Multi-objective Semantic Evolution for Prompt Optimization

Sara Câmara, Eduardo Luz, Valéria Carvalho, Ivan Meneghini, Gladston Moreira

TL;DR

MOPrompt tackles the challenge of prompt engineering for LLMs by formulating a bi-objective optimization that minimizes token usage and maximizes accuracy. It uses an EMO framework with LLM-based genetic operators to explore the Pareto front of prompts, enabling practitioners to select trade-offs suitable for real-world deployment. Evaluated on a Portuguese sentiment task with Gemma-2B and Sabiazinho-3, MOPrompt outperforms a strong single-objective baseline and achieves substantial token reductions (up to around 31%) without compromising peak accuracy. The work provides empirical evidence that prompting strategies and model choice interact with optimization dynamics, and it highlights avenues for expanding to more tasks and enhancing prompt diversity.

Abstract

Prompt engineering is crucial for unlocking the potential of Large Language Models (LLMs). Still, since manual prompt design is often complex, non-intuitive, and time-consuming, automatic prompt optimization has emerged as a research area. However, a significant challenge in prompt optimization is managing the inherent trade-off between task performance, such as accuracy, and context size. Most existing automated methods focus on a single objective, typically performance, thereby failing to explore the critical spectrum of efficiency and effectiveness. This paper introduces the MOPrompt, a novel Multi-objective Evolutionary Optimization (EMO) framework designed to optimize prompts for both accuracy and context size (measured in tokens) simultaneously. Our framework maps the Pareto front of prompt solutions, presenting practitioners with a set of trade-offs between context size and performance, a crucial tool for deploying Large Language Models (LLMs) in real-world applications. We evaluate MOPrompt on a sentiment analysis task in Portuguese, using Gemma-2B and Sabiazinho-3 as evaluation models. Our findings show that MOPrompt substantially outperforms the baseline framework. For the Sabiazinho model, MOPrompt identifies a prompt that achieves the same peak accuracy (0.97) as the best baseline solution, but with a 31% reduction in token length.

MOPrompt: Multi-objective Semantic Evolution for Prompt Optimization

TL;DR

MOPrompt tackles the challenge of prompt engineering for LLMs by formulating a bi-objective optimization that minimizes token usage and maximizes accuracy. It uses an EMO framework with LLM-based genetic operators to explore the Pareto front of prompts, enabling practitioners to select trade-offs suitable for real-world deployment. Evaluated on a Portuguese sentiment task with Gemma-2B and Sabiazinho-3, MOPrompt outperforms a strong single-objective baseline and achieves substantial token reductions (up to around 31%) without compromising peak accuracy. The work provides empirical evidence that prompting strategies and model choice interact with optimization dynamics, and it highlights avenues for expanding to more tasks and enhancing prompt diversity.

Abstract

Prompt engineering is crucial for unlocking the potential of Large Language Models (LLMs). Still, since manual prompt design is often complex, non-intuitive, and time-consuming, automatic prompt optimization has emerged as a research area. However, a significant challenge in prompt optimization is managing the inherent trade-off between task performance, such as accuracy, and context size. Most existing automated methods focus on a single objective, typically performance, thereby failing to explore the critical spectrum of efficiency and effectiveness. This paper introduces the MOPrompt, a novel Multi-objective Evolutionary Optimization (EMO) framework designed to optimize prompts for both accuracy and context size (measured in tokens) simultaneously. Our framework maps the Pareto front of prompt solutions, presenting practitioners with a set of trade-offs between context size and performance, a crucial tool for deploying Large Language Models (LLMs) in real-world applications. We evaluate MOPrompt on a sentiment analysis task in Portuguese, using Gemma-2B and Sabiazinho-3 as evaluation models. Our findings show that MOPrompt substantially outperforms the baseline framework. For the Sabiazinho model, MOPrompt identifies a prompt that achieves the same peak accuracy (0.97) as the best baseline solution, but with a 31% reduction in token length.

Paper Structure

This paper contains 24 sections, 2 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Pareto Front evolution for 0, 5, and 10 generations, running the MOPrompt framework using a "few-shot" strategy.