GPTCoach: Towards LLM-Based Physical Activity Coaching

Matthew Jörke; Shardul Sapkota; Lyndsea Warkenthien; Niklas Vainio; Paul Schmiedmayer; Emma Brunskill; James A. Landay

GPTCoach: Towards LLM-Based Physical Activity Coaching

Matthew Jörke, Shardul Sapkota, Lyndsea Warkenthien, Niklas Vainio, Paul Schmiedmayer, Emma Brunskill, James A. Landay

TL;DR

GPTCoach investigates how LLMs can address personalization gaps in mobile health by combining an evidence-based onboarding workflow from Active Choices, MI-inspired conversational strategies, and real-time wearable data. The authors develop a technology-probe GPTCoach, implement a multi-prompt chaining architecture, and evaluate a single-session lab study with 16 participants using three months of HealthKit data, finding strong MI adherence, perceived personalization, and user comfort with data sharing. Key contributions include three design principles for LLM-based physical activity coaching, the GPTCoach system design with data-grounded prompt chains, and insights from a qualitative and quantitative evaluation about MI fidelity, data utilization, and user experience. The work highlights practical implications for future mobile health systems, LLM training/evaluation, and privacy/bias/safety considerations, signaling a promising path toward scalable, context-aware physical activity coaching while acknowledging limitations and safety risks.

Abstract

Mobile health applications show promise for scalable physical activity promotion but are often insufficiently personalized. In contrast, health coaching offers highly personalized support but can be prohibitively expensive and inaccessible. This study draws inspiration from health coaching to explore how large language models (LLMs) might address personalization challenges in mobile health. We conduct formative interviews with 12 health professionals and 10 potential coaching recipients to develop design principles for an LLM-based health coach. We then built GPTCoach, a chatbot that implements the onboarding conversation from an evidence-based coaching program, uses conversational strategies from motivational interviewing, and incorporates wearable data to create personalized physical activity plans. In a lab study with 16 participants using three months of historical data, we find promising evidence that GPTCoach gathers rich qualitative information to offer personalized support, with users feeling comfortable sharing concerns. We conclude with implications for future research on LLM-based physical activity support.

GPTCoach: Towards LLM-Based Physical Activity Coaching

TL;DR

Abstract

Paper Structure (75 sections, 27 figures, 20 tables)

This paper contains 75 sections, 27 figures, 20 tables.

Introduction
Related Work
Health Coaching with Humans & Conversational Agents
Personal Informatics & Reflection on Personal Data
Formative Interview Study
Participants
Procedure
Analysis
Results
RQ1: The Role of Coaches as Facilitators, Educators, and Supporters
RQ2: The Role of Data & Technology as Guiders, not Drivers
GPTCoach: Design & Implementation
Design Principles
The Active Choices Program
Design Process
...and 60 more sections

Figures (27)

Figure 1: Overview of GPTCoach's System Architecture. HealthKit data from connected devices (e.g., iPhones and wearables) are synced to our Google Cloud Firestore Database using an iOS application. A Python backend server handles several features: fetching health data from the database, aggregating and featurizing health data, handling LLM prompt management logic, interfacing with the OpenAI API to query GPT4, executing tool calls, generating data visualizations for the frontend, as well as sending and receiving chat messages from the frontend. The Typescript/React frontend web app hosts the chat interface and renders interactive data visualizations.
Figure 2: Overview and a walkthrough of GPTCoach's Prompt Chains. (A) On the left, we show an overview of the prompt chains, in which the first chain manages the dialogue state, the second chain grounds the model's response in MI strategies, and the third chain determines whether the response should be augmented with health data. (B) On the right, we show the outputs of each prompt chain for an example exchange.
Figure 3: Participant Responses to Survey Items on User Experience & Quality of Advice. Participants had an overwhelming positive, comfortable, and supportive experience interacting with GPTCoach. The advice they received was personalized, actionable, and not unsolicited. Full questions are provided in Appendix \ref{['appendix:ux-questions']}.
Figure 4: Progression of GPTCoach's Dialogue States By Turn Index. We find that GPTCoach adaptively allocates more conversational turns for gathering information about past experiences, barriers, motivation. GPTCoach allocates the most turns for the goal-setting state. Tools calls are appropriately called during past experience, goal setting, and advice.
Figure 5: Progression of GPTCoach's Internal MI Strategies By Turn Index. We find that most of the conversation is spent asking questions and that Question, Reflect, and Affirm precede Advise with Permission and Giving Information. Questions and reflections preceding advice is more aligned with high-quality counselor behavior perez2019makes.
...and 22 more figures

GPTCoach: Towards LLM-Based Physical Activity Coaching

TL;DR

Abstract

GPTCoach: Towards LLM-Based Physical Activity Coaching

Authors

TL;DR

Abstract

Table of Contents

Figures (27)