Evaluating Prompt Engineering Strategies for Sentiment Control in AI-Generated Texts

Kerstin Sahler; Sophie Jentzsch

Evaluating Prompt Engineering Strategies for Sentiment Control in AI-Generated Texts

Kerstin Sahler, Sophie Jentzsch

TL;DR

This work systematically evaluates prompt engineering as a resource-efficient method to steer sentiment in LLM-generated text, using Ekman’s six emotions and a DistilRoBERTa emotion classifier to compare Vanilla, Zero-Shot, Zero-Shot CoT, Few-Shot, and CoT prompts against fine-tuning. The study finds that Few-Shot prompts with carefully crafted human-written examples yield the strongest emotion steering, outperforming even a fine-tuned baseline in several cases, while Zero-Shot approaches offer a lightweight but sometimes weaker alternative. CoT prompts show mixed results, with reasoning text occasionally aiding but often underperforming relative to Few-Shot and Zero-Shot strategies. The results highlight the practical value of prompt design for emotion-adaptive AI, especially in data-limited contexts, while acknowledging linguistic scope, evaluation methods, and stylistic transfer as important avenues for future work.

Abstract

The groundbreaking capabilities of Large Language Models (LLMs) offer new opportunities for enhancing human-computer interaction through emotion-adaptive Artificial Intelligence (AI). However, deliberately controlling the sentiment in these systems remains challenging. The present study investigates the potential of prompt engineering for controlling sentiment in LLM-generated text, providing a resource-sensitive and accessible alternative to existing methods. Using Ekman's six basic emotions (e.g., joy, disgust), we examine various prompting techniques, including Zero-Shot and Chain-of-Thought prompting using gpt-3.5-turbo, and compare it to fine-tuning. Our results indicate that prompt engineering effectively steers emotions in AI-generated texts, offering a practical and cost-effective alternative to fine-tuning, especially in data-constrained settings. In this regard, Few-Shot prompting with human-written examples was the most effective among other techniques, likely due to the additional task-specific guidance. The findings contribute valuable insights towards developing emotion-adaptive AI systems.

Evaluating Prompt Engineering Strategies for Sentiment Control in AI-Generated Texts

TL;DR

Abstract

Paper Structure (32 sections, 4 figures, 2 tables)

This paper contains 32 sections, 4 figures, 2 tables.

Introduction
Related Work
Method
Vanilla Prompt
Zero-Shot Prompts
Zero-Shot CoT Prompts
Few-Shot Prompts
Chain-of-Thought Prompts
Experiments
Prompt Engineering Experiments
Base Model
Example Development
Reasoning Texts Development
Evaluation
Fine-Tuning Experiments
...and 17 more sections

Figures (4)

Figure 1: Experimental Pipeline. Comparing four established prompting techniques: (1) Zero-Shot, (2) Zero-Shot Chain-of-Thought, (3) Few-Shot, and (4) Chain-of-Thought prompting. For each step, different elements were optimized. The final result is passed on as a starting point for the next technique. Vanilla prompt serves as a baseline.
Figure 2: General Prompting Scheme. Factual or subjective queries were combined with the instruction and target emotion in the user prompt. Depending on the approach, additional elements such as a system prompt (for adding personas) or examples (e.g., for Few-Shot prompts) were included in the message parameter and processed by OpenAI's gpt-3.5-turbo. Finally, the model responses were evaluated by measuring the presence of the target emotion.
Figure 3: Overview over all Emotion Scores. The Emotion Score was determined for each tested approach, with results shown separately per query type (factual or subjective) and as a combined value.
Figure 4: Text Quality in Model Responses (Graph). The textual quality results of the highest-rated approach of each technique, based on the Emotion Score, are illustrated. All metrics are also reported for the baseline, except the Emotion Score, which cannot be calculated for the Vanilla prompt. For the Flesh Reading Ease Score, a higher value indicates increased readability, meaning the text is more simple.

Evaluating Prompt Engineering Strategies for Sentiment Control in AI-Generated Texts

TL;DR

Abstract

Evaluating Prompt Engineering Strategies for Sentiment Control in AI-Generated Texts

Authors

TL;DR

Abstract

Table of Contents

Figures (4)