Table of Contents
Fetching ...

MOPO: Multi-Objective Prompt Optimization for Affective Text Generation

Yarik Menchaca Resendiz, Roman Klinger

TL;DR

MOPO tackles the challenge of expressing affective content across domains by introducing a three-layer, self-optimizing framework that uses Pareto-based multi-objective optimization to produce a diverse set of high-performing prompts. By combining Combine and Paraphrase operations and selecting prompts via NSGA-II, MOPO delivers a Pareto front of prompts that balance multiple domain-specific emotion objectives, reducing the need for separate optimizations per objective. Empirical results across three emotion datasets and multiple LLMs show MOPO yields substantial gains over single-objective baselines and even state-of-the-art prompt optimizers, while maintaining comparable text quality. The approach enables end-users to select prompts tailored to context or to opt for balanced prompts that generalize across domains, with potential applicability beyond affective text generation to other NLP tasks.

Abstract

How emotions are expressed depends on the context and domain. On X (formerly Twitter), for instance, an author might simply use the hashtag #anger, while in a news headline, emotions are typically written in a more polite, indirect manner. To enable conditional text generation models to create emotionally connotated texts that fit a domain, users need to have access to a parameter that allows them to choose the appropriate way to express an emotion. To achieve this, we introduce MOPO, a Multi-Objective Prompt Optimization methodology. MOPO optimizes prompts according to multiple objectives (which correspond here to the output probabilities assigned by emotion classifiers trained for different domains). In contrast to single objective optimization, MOPO outputs a set of prompts, each with a different weighting of the multiple objectives. Users can then choose the most appropriate prompt for their context. We evaluate MOPO using three objectives, determined by various domain-specific emotion classifiers. MOPO improves performance by up to 15 pp across all objectives with a minimal loss (1-2 pp) for any single objective compared to single-objective optimization. These minor performance losses are offset by a broader generalization across multiple objectives - which is not possible with single-objective optimization. Additionally, MOPO reduces computational requirements by simultaneously optimizing for multiple objectives, eliminating separate optimization procedures for each objective.

MOPO: Multi-Objective Prompt Optimization for Affective Text Generation

TL;DR

MOPO tackles the challenge of expressing affective content across domains by introducing a three-layer, self-optimizing framework that uses Pareto-based multi-objective optimization to produce a diverse set of high-performing prompts. By combining Combine and Paraphrase operations and selecting prompts via NSGA-II, MOPO delivers a Pareto front of prompts that balance multiple domain-specific emotion objectives, reducing the need for separate optimizations per objective. Empirical results across three emotion datasets and multiple LLMs show MOPO yields substantial gains over single-objective baselines and even state-of-the-art prompt optimizers, while maintaining comparable text quality. The approach enables end-users to select prompts tailored to context or to opt for balanced prompts that generalize across domains, with potential applicability beyond affective text generation to other NLP tasks.

Abstract

How emotions are expressed depends on the context and domain. On X (formerly Twitter), for instance, an author might simply use the hashtag #anger, while in a news headline, emotions are typically written in a more polite, indirect manner. To enable conditional text generation models to create emotionally connotated texts that fit a domain, users need to have access to a parameter that allows them to choose the appropriate way to express an emotion. To achieve this, we introduce MOPO, a Multi-Objective Prompt Optimization methodology. MOPO optimizes prompts according to multiple objectives (which correspond here to the output probabilities assigned by emotion classifiers trained for different domains). In contrast to single objective optimization, MOPO outputs a set of prompts, each with a different weighting of the multiple objectives. Users can then choose the most appropriate prompt for their context. We evaluate MOPO using three objectives, determined by various domain-specific emotion classifiers. MOPO improves performance by up to 15 pp across all objectives with a minimal loss (1-2 pp) for any single objective compared to single-objective optimization. These minor performance losses are offset by a broader generalization across multiple objectives - which is not possible with single-objective optimization. Additionally, MOPO reduces computational requirements by simultaneously optimizing for multiple objectives, eliminating separate optimization procedures for each objective.

Paper Structure

This paper contains 33 sections, 6 figures, 17 tables, 3 algorithms.

Figures (6)

  • Figure 1: Examples of prompt-based generated text. The prompts are optimized for two conflicting objectives: News Headlines and Social Media. The Emotion Fitness Score evaluates how well the text fulfills each objective. In the Single Objective section, prompts are optimized either for News Headlines (high score for news) or Social Media (high score for social media), leading to lower fitness scores in the other category. In contrast, Multi-Objective prompts optimize for both News Headlines and Social Media simultaneously, generating a range of high-performing options. Users can select the best-performing prompt for each objective or choose a balanced option (e.g., "Severe Weather Alert -- Stay Prepared", which fits 85% across all objectives).
  • Figure 2: Three layers of prompts in our MOPO approach for multi-objective prompt optimization for affective text generation.
  • Figure 3: Improvement in the 10 best-performing prompts from Generation 1 (dark blue) to 10 (yellow). Most prompts reach almost a score of 1.
  • Figure 4: Improvement across generations of the best-performing prompts for the emotion joy. Comparing two objectives at the time. In the last generation (yellow) most of the prompts are close to 1 score (optimal performance).
  • Figure 5: Improvement across generations of the best-performing prompts, starting in generation 1 to 10. Comparing two objectives at the time. In the last generation, most of the prompts are close to 1 score.
  • ...and 1 more figures