Table of Contents
Fetching ...

Bloom: Designing for LLM-Augmented Behavior Change Interactions

Matthew Jörke, Defne Genç, Valentin Teutschbein, Shardul Sapkota, Sarah Chung, Paul Schmiedmayer, Maria Ines Campero, Abby C. King, Emma Brunskill, James A. Landay

TL;DR

Bloom investigates how an LLM-powered health coaching agent can augment established UI-based behavior-change interactions to promote physical activity. By integrating MI-informed chat with goal setting, planning, tracking, ambient feedback, and data visualizations, Bloom collects qualitative and quantitative data from a four-week field study (N=54). The results show that the LLM condition shifts psychological factors such as perceived benefits, enjoyment, and self-compassion, and yields more personalized plans and higher engagement, though objective PA gains are similar to a non-LLM control in the short term. These findings suggest LLMs may primarily influence readiness and maintenance pathways for longer-term behavior change, with design implications for coaching, agency, and safety in future LLM-augmented health interventions. The work also contributes a safety benchmark and a red-teaming dataset to advance safe deployment of health-oriented LLM coaching systems.

Abstract

Large language models (LLMs) offer novel opportunities to support health behavior change, yet existing work has narrowly focused on text-only interactions. Building on decades of HCI research demonstrating the effectiveness of UI-based interactions, we present Bloom, an application for physical activity promotion that integrates an LLM-based health coaching chatbot with established UI-based interactions. As part of Bloom's development, we conducted a redteaming evaluation and contribute a safety benchmark dataset. In a four-week randomized field study (N=54) comparing Bloom to a non-LLM control, we observed important shifts in psychological outcomes: participants in the LLM condition reported stronger beliefs that activity was beneficial, greater enjoyment, and more self-compassion. Both conditions significantly increased physical activity levels, doubling the proportion of participants meeting recommended weekly guidelines, though we observed no significant differences between conditions. Instead, our findings suggest that LLMs may be more effective at shifting mindsets that precede longer-term behavior change.

Bloom: Designing for LLM-Augmented Behavior Change Interactions

TL;DR

Bloom investigates how an LLM-powered health coaching agent can augment established UI-based behavior-change interactions to promote physical activity. By integrating MI-informed chat with goal setting, planning, tracking, ambient feedback, and data visualizations, Bloom collects qualitative and quantitative data from a four-week field study (N=54). The results show that the LLM condition shifts psychological factors such as perceived benefits, enjoyment, and self-compassion, and yields more personalized plans and higher engagement, though objective PA gains are similar to a non-LLM control in the short term. These findings suggest LLMs may primarily influence readiness and maintenance pathways for longer-term behavior change, with design implications for coaching, agency, and safety in future LLM-augmented health interventions. The work also contributes a safety benchmark and a red-teaming dataset to advance safe deployment of health-oriented LLM coaching systems.

Abstract

Large language models (LLMs) offer novel opportunities to support health behavior change, yet existing work has narrowly focused on text-only interactions. Building on decades of HCI research demonstrating the effectiveness of UI-based interactions, we present Bloom, an application for physical activity promotion that integrates an LLM-based health coaching chatbot with established UI-based interactions. As part of Bloom's development, we conducted a redteaming evaluation and contribute a safety benchmark dataset. In a four-week randomized field study (N=54) comparing Bloom to a non-LLM control, we observed important shifts in psychological outcomes: participants in the LLM condition reported stronger beliefs that activity was beneficial, greater enjoyment, and more self-compassion. Both conditions significantly increased physical activity levels, doubling the proportion of participants meeting recommended weekly guidelines, though we observed no significant differences between conditions. Instead, our findings suggest that LLMs may be more effective at shifting mindsets that precede longer-term behavior change.

Paper Structure

This paper contains 69 sections, 1 equation, 8 figures, 11 tables.

Figures (8)

  • Figure 1: Additional screens in the Bloom application. (A) The Plan tab shows the user's current weekly plan, an LLM-generated progress summary, along with missed, upcoming, and past activities. (B) The Insights tab presents visualizations of wearable data, annotated with LLM-generated summaries of trends and progress. (C) When the user's garden grows, a celebratory modal appears with an LLM-generated message linking progress in the ambient display to recent achievements. (D) During at-will chat, the user can request edits to their plan in natural language, upon which the agent calls plan edit tools. The generated plan is shown as an inline chat widget.
  • Figure 2: System Architecture & Context Management. (A) System Inputs: Bloom draws on wearable data from Apple's HealthKit API and natural-language input from LLM chats. (B) Application Context: Bloom integrates quantitative context, including wearable data and weekly plan progress, with qualitative context in the form of memory summaries from past conversations. (C) LLM Chatbot: Beebo operates in three modes (onboarding, check-in, and at-will chat) and uses dialogue state management and motivational interviewing prompt chains. All responses are passed through safety filters, and the agent can invoke tools to query health data or generate and edit weekly plans. (D) System Outputs: The LLM produces push notifications, natural-language plan and data summaries, inline chat widgets in response to tool calls, and celebratory progress messages linked to the ambient display (e.g., garden growth).
  • Figure 3: Bloom's Ambient Display.(A) Theme Preference Study. In a online preference study, participants rated three candidate themes---space, underwater, and garden---each with matching avatars. (B) Final Ambient Display. Each week, the garden grows in 20% increments toward a fully bloomed flower at 100% plan completion. Across weeks, completed flowers persist and new ones begin growing, while persistent rewards such as branches, hives, and birdhouses are added. Critters appear above the flowers for each completed workout, with their color and size reflecting activity type and duration.
  • Figure 4: Treatment and Control Conditions. The treatment condition is the Bloom app (Section \ref{['sec:system']}), which includes all LLM-based features. The control condition does not include any LLM-based features: it removes the chat, uses UI-based plan creation, removes progress or data summaries, and uses template-based notifications.
  • Figure 5: Field Study Procedures & Data Collection. During onboarding, participants completed a pre-study survey, a one-hour onboarding interview, and provided three months of baseline HealthKit data upon installation of the application. The main study period lasted four weeks, during which participants used Bloom and completed weekly and (optional) daily surveys. During offboarding, participants completed a one-hour interview and post-study survey.
  • ...and 3 more figures