Table of Contents
Fetching ...

Leveraging AI-Generated Emotional Self-Voice to Nudge People towards their Ideal Selves

Cathy Mengying Fang, Phoebe Chua, Samantha Chan, Joanne Leong, Andria Bao, Pattie Maes

TL;DR

This paper addresses nudging individuals toward their ideal selves using Emotional Self Voice (ESV), a system that generates ideal-self responses in the user’s own voice via a three-step pipeline: text generation, emotionally expressive speech synthesis, and voice cloning. It conducts a between-subjects study with 60 participants across ESV, text-only, and mental-imagination conditions to assess affect, resilience, confidence, motivation, and commitment in two goal-oriented scenarios. Findings show broad positive shifts in affect and motivational constructs across conditions, with ESV offering unique engagement and stronger increases in confidence and motivation, particularly for future habit formation. The work contributes to theory by linking self-discrepancy with self-voice as a behavioral intervention modality and demonstrates the feasibility and ethical considerations of personalized AI-generated self-voices in real-world goal pursuit contexts.

Abstract

Emotions, shaped by past experiences, significantly influence decision-making and goal pursuit. Traditional cognitive-behavioral techniques for personal development rely on mental imagery to envision ideal selves, but may be less effective for individuals who struggle with visualization. This paper introduces Emotional Self-Voice (ESV), a novel system combining emotionally expressive language models and voice cloning technologies to render customized responses in the user's own voice. We investigate the potential of ESV to nudge individuals towards their ideal selves in a study with 60 participants. Across all three conditions (ESV, text-only, and mental imagination), we observed an increase in resilience, confidence, motivation, and goal commitment, and the ESV condition was perceived as uniquely engaging and personalized. We discuss the implications of designing generated self-voice systems as a personalized behavioral intervention for different scenarios.

Leveraging AI-Generated Emotional Self-Voice to Nudge People towards their Ideal Selves

TL;DR

This paper addresses nudging individuals toward their ideal selves using Emotional Self Voice (ESV), a system that generates ideal-self responses in the user’s own voice via a three-step pipeline: text generation, emotionally expressive speech synthesis, and voice cloning. It conducts a between-subjects study with 60 participants across ESV, text-only, and mental-imagination conditions to assess affect, resilience, confidence, motivation, and commitment in two goal-oriented scenarios. Findings show broad positive shifts in affect and motivational constructs across conditions, with ESV offering unique engagement and stronger increases in confidence and motivation, particularly for future habit formation. The work contributes to theory by linking self-discrepancy with self-voice as a behavioral intervention modality and demonstrates the feasibility and ethical considerations of personalized AI-generated self-voices in real-world goal pursuit contexts.

Abstract

Emotions, shaped by past experiences, significantly influence decision-making and goal pursuit. Traditional cognitive-behavioral techniques for personal development rely on mental imagery to envision ideal selves, but may be less effective for individuals who struggle with visualization. This paper introduces Emotional Self-Voice (ESV), a novel system combining emotionally expressive language models and voice cloning technologies to render customized responses in the user's own voice. We investigate the potential of ESV to nudge individuals towards their ideal selves in a study with 60 participants. Across all three conditions (ESV, text-only, and mental imagination), we observed an increase in resilience, confidence, motivation, and goal commitment, and the ESV condition was perceived as uniquely engaging and personalized. We discuss the implications of designing generated self-voice systems as a personalized behavioral intervention for different scenarios.
Paper Structure (54 sections, 11 figures, 7 tables)

This paper contains 54 sections, 11 figures, 7 tables.

Figures (11)

  • Figure 1: An overview of the Emotional Self Voice system. The system consists of multiple generative models for synthesizing texts and speeches. The individual provides a scenario and the corresponding ideal self characteristics. Our system generates the text and audio response of in the style of the ideal self and in the voice of the individual.
  • Figure 2: The user interface of the Emotional Self Voice system for the study. The top shows the scenario provided by the user. The user is asked to provide adjectives that describe the ideal self and then receives the generated text and self-voice response. The user can fine-tune the response by adjusting the parameters.
  • Figure 3: Example scenarios, ideal self attributes and generated ideal self responses for both scenarios.
  • Figure 4: An overview of the user study procedure. PsQ-i: Plymouth Sensory Imagery Questionnaire; SSC-S: Self-Compassion Scale; IPIP: assesses the Big Five personality dimensions; BRT: Benchmark Resilience Tool; FSCQ: Future-Self Continuity Questionnaire.
  • Figure 5: Questionnaire results of the main research outcomes pre- and post- intervention. Error bars: SE. 'S1': Scenario 1, 'S2': Scenario 1; *: $<$0.05, **:$<$0.01,***:$<$0.001
  • ...and 6 more figures