Table of Contents
Fetching ...

SeSaMe: A Framework to Simulate Self-Reported Ground Truth for Mental Health Sensing Studies

Akshat Choube, Vedant Das Swain, Varun Mishra

TL;DR

The results indicate SeSaMe to be a promising approach, but its alignment may vary across scales and specific prediction objectives, and the potential implications of SeSaMe in addressing some challenges with ground-truth collection in passive sensing studies are discussed.

Abstract

Advances in mobile and wearable technologies have enabled the potential to passively monitor a person's mental, behavioral, and affective health. These approaches typically rely on longitudinal collection of self-reported outcomes, e.g., depression, stress, and anxiety, to train machine learning (ML) models. However, the need to continuously self-report adds a significant burden on the participants, often resulting in attrition, missing labels, or insincere responses. In this work, we introduce the Scale Scores Simulation using Mental Models (SeSaMe) framework to alleviate participants' burden in digital mental health studies. By leveraging pre-trained large language models (LLMs), SeSaMe enables the simulation of participants' responses on psychological scales. In SeSaMe, researchers can prompt LLMs with information on participants' internal behavioral dispositions, enabling LLMs to construct mental models of participants to simulate their responses on psychological scales. We demonstrate an application of SeSaMe, where we use GPT-4 to simulate responses on one scale using responses from another as behavioral information. We also evaluate the alignment between human and SeSaMe-simulated responses to psychological scales. Then, we present experiments to inspect the utility of SeSaMe-simulated responses as ground truth in training ML models by replicating established depression and anxiety screening tasks from a previous study. Our results indicate SeSaMe to be a promising approach, but its alignment may vary across scales and specific prediction objectives. We also observed that model performance with simulated data was on par with using the real data for training in most evaluation scenarios. We conclude by discussing the potential implications of SeSaMe in addressing some challenges researchers face with ground-truth collection in passive sensing studies.

SeSaMe: A Framework to Simulate Self-Reported Ground Truth for Mental Health Sensing Studies

TL;DR

The results indicate SeSaMe to be a promising approach, but its alignment may vary across scales and specific prediction objectives, and the potential implications of SeSaMe in addressing some challenges with ground-truth collection in passive sensing studies are discussed.

Abstract

Advances in mobile and wearable technologies have enabled the potential to passively monitor a person's mental, behavioral, and affective health. These approaches typically rely on longitudinal collection of self-reported outcomes, e.g., depression, stress, and anxiety, to train machine learning (ML) models. However, the need to continuously self-report adds a significant burden on the participants, often resulting in attrition, missing labels, or insincere responses. In this work, we introduce the Scale Scores Simulation using Mental Models (SeSaMe) framework to alleviate participants' burden in digital mental health studies. By leveraging pre-trained large language models (LLMs), SeSaMe enables the simulation of participants' responses on psychological scales. In SeSaMe, researchers can prompt LLMs with information on participants' internal behavioral dispositions, enabling LLMs to construct mental models of participants to simulate their responses on psychological scales. We demonstrate an application of SeSaMe, where we use GPT-4 to simulate responses on one scale using responses from another as behavioral information. We also evaluate the alignment between human and SeSaMe-simulated responses to psychological scales. Then, we present experiments to inspect the utility of SeSaMe-simulated responses as ground truth in training ML models by replicating established depression and anxiety screening tasks from a previous study. Our results indicate SeSaMe to be a promising approach, but its alignment may vary across scales and specific prediction objectives. We also observed that model performance with simulated data was on par with using the real data for training in most evaluation scenarios. We conclude by discussing the potential implications of SeSaMe in addressing some challenges researchers face with ground-truth collection in passive sensing studies.
Paper Structure (22 sections, 1 equation, 8 figures, 2 tables)

This paper contains 22 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: An example prompt for generating GAD-7 scores from provided PHQ-9 scores. Only a few questions from each scale are represented for brevity, but the actual prompt included all nine PHQ-9 and seven GAD-7 questions.
  • Figure 2: Density distribution of original and GPT-4 simulated scores for StudentLife dataset
  • Figure 3: Density distribution of original and GPT-4 simulated scores for DepreST-CAT dataset
  • Figure 4: Performance comparison of F1-scores for XGBoost trained original and GPT-4 simulated scores for Depression (left) and Anxiety (right).
  • Figure 5: Scatter plots to compare the distribution of original scores and simulated scores when GAD-7 scores are used to simulate PHQ-9 scores (left) and vice-versa (right). Circle sizes are proportional to the number of data points. The GPT-4 simulated data is more linear making it less diverse.
  • ...and 3 more figures