Table of Contents
Fetching ...

Future You: Designing and Evaluating Multimodal AI-generated Digital Twins for Strengthening Future Self-Continuity

Constanze Albrecht, Chayapatr Archiwaranguprok, Rachel Poonsiriwong, Awu Chen, Peggy Yin, Monchai Lertsutthiwong, Kavin Winson, Hal Hershfield, Pattie Maes, Pat Pataranutaporn

TL;DR

This study investigates whether different modalities (text, voice, avatar) of AI-generated future selves affect Future Self-Continuity and well-being. Using a randomized trial (N=92) and Claude 4 as the conversational backbone, the authors compare three personalized modalities against a neutral control and also benchmark LLM quality against alternatives. Findings show all personalized modalities robustly enhance FSC, hope, and motivation, with interaction quality (persuasiveness, realism, engagement) as a stronger predictor of outcomes than modality. While avatars produced the largest vividness gains, the results indicate that high-quality conversational AI can achieve comparable psychological benefits across modalities, informing scalable design principles for future-self interventions. The work also highlights ethical considerations around autonomy and narrative authorship as AI-mediated self-reflection becomes more prevalent.

Abstract

What if users could meet their future selves today? AI-generated future selves simulate meaningful encounters with a digital twin decades in the future. As AI systems advance, combining cloned voices, age-progressed facial rendering, and autobiographical narratives, a central question emerges: Does the modality of these future selves alter their psychological and affective impact? How might a text-based chatbot, a voice-only system, or a photorealistic avatar shape present-day decisions and our feeling of connection to the future? We report a randomized controlled study (N=92) evaluating three modalities of AI-generated future selves (text, voice, avatar) against a neutral control condition. We also report a systematic model evaluation between Claude 4 and three other Large Language Models (LLMs), assessing Claude 4 across psychological and interaction dimensions and establishing conversational AI quality as a critical determinant of intervention effectiveness. All personalized modalities strengthened Future Self-Continuity (FSC), emotional well-being, and motivation compared to control, with avatar producing the largest vividness gains, yet with no significant differences between formats. Interaction quality metrics, particularly persuasiveness, realism, and user engagement, emerged as robust predictors of psychological and affective outcomes, indicating that how compelling the interaction feels matters more than the form it takes. Content analysis found thematic patterns: text emphasized career planning, while voice and avatar facilitated personal reflection. Claude 4 outperformed ChatGPT 3.5, Llama 4, and Qwen 3 in enhancing psychological, affective, and FSC outcomes.

Future You: Designing and Evaluating Multimodal AI-generated Digital Twins for Strengthening Future Self-Continuity

TL;DR

This study investigates whether different modalities (text, voice, avatar) of AI-generated future selves affect Future Self-Continuity and well-being. Using a randomized trial (N=92) and Claude 4 as the conversational backbone, the authors compare three personalized modalities against a neutral control and also benchmark LLM quality against alternatives. Findings show all personalized modalities robustly enhance FSC, hope, and motivation, with interaction quality (persuasiveness, realism, engagement) as a stronger predictor of outcomes than modality. While avatars produced the largest vividness gains, the results indicate that high-quality conversational AI can achieve comparable psychological benefits across modalities, informing scalable design principles for future-self interventions. The work also highlights ethical considerations around autonomy and narrative authorship as AI-mediated self-reflection becomes more prevalent.

Abstract

What if users could meet their future selves today? AI-generated future selves simulate meaningful encounters with a digital twin decades in the future. As AI systems advance, combining cloned voices, age-progressed facial rendering, and autobiographical narratives, a central question emerges: Does the modality of these future selves alter their psychological and affective impact? How might a text-based chatbot, a voice-only system, or a photorealistic avatar shape present-day decisions and our feeling of connection to the future? We report a randomized controlled study (N=92) evaluating three modalities of AI-generated future selves (text, voice, avatar) against a neutral control condition. We also report a systematic model evaluation between Claude 4 and three other Large Language Models (LLMs), assessing Claude 4 across psychological and interaction dimensions and establishing conversational AI quality as a critical determinant of intervention effectiveness. All personalized modalities strengthened Future Self-Continuity (FSC), emotional well-being, and motivation compared to control, with avatar producing the largest vividness gains, yet with no significant differences between formats. Interaction quality metrics, particularly persuasiveness, realism, and user engagement, emerged as robust predictors of psychological and affective outcomes, indicating that how compelling the interaction feels matters more than the form it takes. Content analysis found thematic patterns: text emphasized career planning, while voice and avatar facilitated personal reflection. Claude 4 outperformed ChatGPT 3.5, Llama 4, and Qwen 3 in enhancing psychological, affective, and FSC outcomes.

Paper Structure

This paper contains 35 sections, 8 figures.

Figures (8)

  • Figure 1: Procedure Overview: This figure illustrates the experimental procedure across three modalities (text, voice, avatar). Participants completed a pre-intervention survey, uploaded an image and voice recording, engaged with their AI-generated future self, then completed a post-intervention survey.
  • Figure 2: Experiment setup overview: The system integrates facial age progression, neural voice cloning, and LLM-based contextual modeling to synthesize an AI-generated future self. Participants are randomly streamed into trying out three modalities of the system, enabling interaction over text, audio, and avatar-based conversation.
  • Figure 3: Comparative performance of four language models across eleven psychological and interaction metrics. Claude 4 showed the highest scores across emotional, relational, and realism dimensions; ChatGPT-3.5 and Llama 4 performed moderately, while Qwen 3 scored lowest. Higher values indicate better performance; error bars show ±1 SD.
  • Figure 4: Pre- and post-intervention scores across emotional, motivational, and future-self measures by condition. Bars represent mean scores with standard deviations. Asterisks indicate significant within-condition improvements (p < .05, *p < .01, **p < .001). Intervention conditions (avatar, text, voice) show consistent gains in positive affect, motivation, and future-self connection compared to the control condition.
  • Figure 5: Adjusted post-intervention means by condition. Bars represent estimated marginal means from ANCOVA models controlling for baseline scores. Error bars show standard errors. Significance brackets indicate FDR-corrected pairwise comparisons versus control (*q < 0.05, **q < 0.01, ***q < 0.001). All three intervention conditions significantly enhanced future-self related outcomes compared to control, with no significant differences among intervention conditions.
  • ...and 3 more figures