Table of Contents
Fetching ...

ELSA: A Style Aligned Dataset for Emotionally Intelligent Language Generation

Vishal Gandhi, Sagar Gandhi

TL;DR

The paper addresses the need for emotion-conditioned language generation data that combines fine-grained emotional granularity with diverse stylistic contexts. It introduces ELSA, a dataset built by mapping dair-ai's six coarse emotions to GoEmotions' fine-grained categories and generating stylistically varied rewrites (conversational, formal, poetic, narrative) via LLM augmentation, followed by automated quality checks. Comprehensive metrics (embedding variance, average emotion distance, readability, Distinct-n, Self-BLEU, perplexity, cosine similarity) demonstrate semantic stability alongside meaningful emotional and stylistic variation. ELSA enables more precise emotional control, style-adaptive generation, interpretability studies, affective computing, and robust benchmarking, with future work focusing on bias mitigation and further fluency optimization in generated text.

Abstract

Advancements in emotion aware language processing increasingly shape vital NLP applications ranging from conversational AI and affective computing to computational psychology and creative content generation. Existing emotion datasets either lack emotional granularity or fail to capture necessary stylistic diversity, limiting the advancement of effective emotion conditioned text generation systems. Seeking to bridge this crucial gap between granularity and style diversity, this paper introduces a novel systematically constructed dataset named ELSA Emotion and Language Style Alignment Dataset leveraging fine grained emotion taxonomies adapted from existing sources such as dair ai emotion dataset and GoEmotions taxonomy. This dataset comprises multiple emotionally nuanced variations of original sentences regenerated across distinct contextual styles such as conversational, formal, poetic, and narrative, using advanced Large Language Models LLMs. Rigorous computational evaluation using metrics such as perplexity, embedding variance, readability, lexical diversity, and semantic coherence measures validates the datasets emotional authenticity, linguistic fluency, and textual diversity. Comprehensive metric analyses affirm its potential to support deeper explorations into emotion conditioned style adaptive text generation. By enabling precision tuned emotionally nuanced language modeling, our dataset creates fertile ground for research on fine grained emotional control, prompt driven explanation, interpretability, and style adaptive expressive language generation with LLMs.

ELSA: A Style Aligned Dataset for Emotionally Intelligent Language Generation

TL;DR

The paper addresses the need for emotion-conditioned language generation data that combines fine-grained emotional granularity with diverse stylistic contexts. It introduces ELSA, a dataset built by mapping dair-ai's six coarse emotions to GoEmotions' fine-grained categories and generating stylistically varied rewrites (conversational, formal, poetic, narrative) via LLM augmentation, followed by automated quality checks. Comprehensive metrics (embedding variance, average emotion distance, readability, Distinct-n, Self-BLEU, perplexity, cosine similarity) demonstrate semantic stability alongside meaningful emotional and stylistic variation. ELSA enables more precise emotional control, style-adaptive generation, interpretability studies, affective computing, and robust benchmarking, with future work focusing on bias mitigation and further fluency optimization in generated text.

Abstract

Advancements in emotion aware language processing increasingly shape vital NLP applications ranging from conversational AI and affective computing to computational psychology and creative content generation. Existing emotion datasets either lack emotional granularity or fail to capture necessary stylistic diversity, limiting the advancement of effective emotion conditioned text generation systems. Seeking to bridge this crucial gap between granularity and style diversity, this paper introduces a novel systematically constructed dataset named ELSA Emotion and Language Style Alignment Dataset leveraging fine grained emotion taxonomies adapted from existing sources such as dair ai emotion dataset and GoEmotions taxonomy. This dataset comprises multiple emotionally nuanced variations of original sentences regenerated across distinct contextual styles such as conversational, formal, poetic, and narrative, using advanced Large Language Models LLMs. Rigorous computational evaluation using metrics such as perplexity, embedding variance, readability, lexical diversity, and semantic coherence measures validates the datasets emotional authenticity, linguistic fluency, and textual diversity. Comprehensive metric analyses affirm its potential to support deeper explorations into emotion conditioned style adaptive text generation. By enabling precision tuned emotionally nuanced language modeling, our dataset creates fertile ground for research on fine grained emotional control, prompt driven explanation, interpretability, and style adaptive expressive language generation with LLMs.

Paper Structure

This paper contains 12 sections, 8 equations, 2 tables.