Table of Contents
Fetching ...

Large Language Model Agents for Improving Engagement with Behavior Change Interventions: Application to Digital Mindfulness

Harsh Kumar, Suhyeon Yoo, Angela Zavaleta Bernuy, Jiakai Shi, Huayin Luo, Joseph Williams, Anastasia Kuzminykh, Ashton Anderson, Rachel Kornfield

TL;DR

This work tackles the problem of declining engagement in self-directed wellness by evaluating LLM-based social support agents in mindfulness interventions. It uses two randomized studies—a large single-session study and a three-week deployment—to compare an informational LLM and a reflection-focused LLM, revealing that a sociable information agent (Mindy) substantially boosts engagement, while the reflection agent yields limited gains. The findings demonstrate the potential of LLM agents to bridge gaps in digital health support, and provide design guidance for incorporating social and conversational elements into scalable interventions, while highlighting safety, memory, and ethical considerations for real-world deployment. The results also indicate that baseline digital interventions (videos and reminders) are already effective, and LLM augmentation offers incremental improvements, especially in initiating and sustaining practice, underscoring the need for longitudinal validation and careful, privacy-conscious design for broad adoption.

Abstract

Although engagement in self-directed wellness exercises typically declines over time, integrating social support such as coaching can sustain it. However, traditional forms of support are often inaccessible due to the high costs and complex coordination. Large Language Models (LLMs) show promise in providing human-like dialogues that could emulate social support. Yet, in-depth, in situ investigations of LLMs to support behavior change remain underexplored. We conducted two randomized experiments to assess the impact of LLM agents on user engagement with mindfulness exercises. First, a single-session study, involved 502 crowdworkers; second, a three-week study, included 54 participants. We explored two types of LLM agents: one providing information and another facilitating self-reflection. Both agents enhanced users' intentions to practice mindfulness. However, only the information-providing LLM, featuring a friendly persona, significantly improved engagement with the exercises. Our findings suggest that specific LLM agents may bridge the social support gap in digital health interventions.

Large Language Model Agents for Improving Engagement with Behavior Change Interventions: Application to Digital Mindfulness

TL;DR

This work tackles the problem of declining engagement in self-directed wellness by evaluating LLM-based social support agents in mindfulness interventions. It uses two randomized studies—a large single-session study and a three-week deployment—to compare an informational LLM and a reflection-focused LLM, revealing that a sociable information agent (Mindy) substantially boosts engagement, while the reflection agent yields limited gains. The findings demonstrate the potential of LLM agents to bridge gaps in digital health support, and provide design guidance for incorporating social and conversational elements into scalable interventions, while highlighting safety, memory, and ethical considerations for real-world deployment. The results also indicate that baseline digital interventions (videos and reminders) are already effective, and LLM augmentation offers incremental improvements, especially in initiating and sustaining practice, underscoring the need for longitudinal validation and careful, privacy-conscious design for broad adoption.

Abstract

Although engagement in self-directed wellness exercises typically declines over time, integrating social support such as coaching can sustain it. However, traditional forms of support are often inaccessible due to the high costs and complex coordination. Large Language Models (LLMs) show promise in providing human-like dialogues that could emulate social support. Yet, in-depth, in situ investigations of LLMs to support behavior change remain underexplored. We conducted two randomized experiments to assess the impact of LLM agents on user engagement with mindfulness exercises. First, a single-session study, involved 502 crowdworkers; second, a three-week study, included 54 participants. We explored two types of LLM agents: one providing information and another facilitating self-reflection. Both agents enhanced users' intentions to practice mindfulness. However, only the information-providing LLM, featuring a friendly persona, significantly improved engagement with the exercises. Our findings suggest that specific LLM agents may bridge the social support gap in digital health interventions.
Paper Structure (70 sections, 1 equation, 11 figures, 2 tables)

This paper contains 70 sections, 1 equation, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Design of formative study. All participants received the core mindfulness message. They were then randomly assigned to engage or not (in this order, one after another) with (1) Information Chatbot, which provided information related to Mindfulness. (2) Instructional Video, which guided the participants to practice mindful breathing. (3) Reflection chatbot, which involved participants in a self-reflection exercise to reflect on their understanding of mindfulness. The 2 (Information Chatbot: Present vs Absent) x 2 (Instructional Video: Present vs Absent) x 2 (Reflection Chatbot: Present vs Absent) factorial design allowed us to explore user perspectives and intentions for different combinations of interaction techniques to enhance the core mindfulness message.
  • Figure 2: Plot depicting the mean intention to practice mindfulness again after engaging in the given exercise(s) on a scale of 1 (not likely at all) to 7 (extremely likely) for all conditions. Each data point represents the mean intention to practice mindfulness again among individuals assigned to the condition, and error bars represent +- one standard error of the mean. Contrasts are calculated using estimated marginal means with ANOVA and significant relationships (p < 0.05) are displayed. * : p $\leq$ 0.05, *** : p$\leq$ 0.001 (adjusting the p-value using the Tukey method to compare a family of 8 estimates.)
  • Figure 3: Design of the deployment study. Participants were reminded via email every other day, over a period of three weeks, to engage in mindfulness exercises, totaling 10 reminders (A) Each reminder email contained a link to the mindfulness exercise interface. Additionally, half of the participants, selected randomly, were given information and a distinct link to interact with Mindy, the sociable information chatbot. (B) Clicking on the mindfulness exercise interface link led participants to the study's initial stage (see Figure \ref{['fig:exercise-interface']}), where they were randomly assigned to either receive only the tutorial video for exercises or to experience the tutorial video followed by interaction with the reflection chatbot. (C) Separately, half of the participants were provided with the option to access Mindy at any time during the study, regardless of their engagement with the mindfulness exercise interface. The study's structure was a 2 x 2 factorial design, varying the presence of the Sociable Information LLM (Present vs Absent) and the Reflection LLM post-video (Present vs Absent).
  • Figure 4: Mindfulness Exercise Interface. Participants were taken to this interface by clicking the exercise link in the email. Initially, they responded to questions about their current stress levels and mental state. (1) They were then directed to a 10-minute instructional video on mindfulness. Each participant was shown a video randomly selected from a pool of six, each focused on teaching a proven mindfulness exercise. (2) After watching the video, half of the participants, randomly selected at the beginning of the study, were given the option to engage in a dialogue with the Reflection LLM agent.
  • Figure 5: Participant Engagement Rates in Deployment Study Across Different Intervention Conditions. Engagement is defined as whether or not a participant was engaged in a particular activity for a particular reminder. Error bars are +- one standard error of mean.
  • ...and 6 more figures