Table of Contents
Fetching ...

Understanding the Impact of Long-Term Memory on Self-Disclosure with Large Language Model-Driven Chatbots for Public Health Intervention

Eunkyung Jo, Yuin Jeong, SoHyun Park, Daniel A. Epstein, Young-Ho Kim

TL;DR

The paper addresses how long-term memory (LTM) in large language model-driven chatbots influences health self-disclosure within a public health intervention. It employs a real-world study of CareCall with and without LTM, analyzing 1,252 call logs and nine interviews to compare disclosure depth, user familiarity, and perceived care. The findings show that LTM increases health-related detail and fosters positive, empathetic impressions, but also reveals friction around chronic health topics and privacy concerns, highlighting the need for selective memory design and responsible memory strategies. The work contributes empirical evidence and design guidance for integrating LTM into public health chatbots to improve engagement and data quality while balancing privacy considerations.

Abstract

Recent large language models (LLMs) offer the potential to support public health monitoring by facilitating health disclosure through open-ended conversations but rarely preserve the knowledge gained about individuals across repeated interactions. Augmenting LLMs with long-term memory (LTM) presents an opportunity to improve engagement and self-disclosure, but we lack an understanding of how LTM impacts people's interaction with LLM-driven chatbots in public health interventions. We examine the case of CareCall -- an LLM-driven voice chatbot with LTM -- through the analysis of 1,252 call logs and interviews with nine users. We found that LTM enhanced health disclosure and fostered positive perceptions of the chatbot by offering familiarity. However, we also observed challenges in promoting self-disclosure through LTM, particularly around addressing chronic health conditions and privacy concerns. We discuss considerations for LTM integration in LLM-driven chatbots for public health monitoring, including carefully deciding what topics need to be remembered in light of public health goals.

Understanding the Impact of Long-Term Memory on Self-Disclosure with Large Language Model-Driven Chatbots for Public Health Intervention

TL;DR

The paper addresses how long-term memory (LTM) in large language model-driven chatbots influences health self-disclosure within a public health intervention. It employs a real-world study of CareCall with and without LTM, analyzing 1,252 call logs and nine interviews to compare disclosure depth, user familiarity, and perceived care. The findings show that LTM increases health-related detail and fosters positive, empathetic impressions, but also reveals friction around chronic health topics and privacy concerns, highlighting the need for selective memory design and responsible memory strategies. The work contributes empirical evidence and design guidance for integrating LTM into public health chatbots to improve engagement and data quality while balancing privacy considerations.

Abstract

Recent large language models (LLMs) offer the potential to support public health monitoring by facilitating health disclosure through open-ended conversations but rarely preserve the knowledge gained about individuals across repeated interactions. Augmenting LLMs with long-term memory (LTM) presents an opportunity to improve engagement and self-disclosure, but we lack an understanding of how LTM impacts people's interaction with LLM-driven chatbots in public health interventions. We examine the case of CareCall -- an LLM-driven voice chatbot with LTM -- through the analysis of 1,252 call logs and interviews with nine users. We found that LTM enhanced health disclosure and fostered positive perceptions of the chatbot by offering familiarity. However, we also observed challenges in promoting self-disclosure through LTM, particularly around addressing chronic health conditions and privacy concerns. We discuss considerations for LTM integration in LLM-driven chatbots for public health monitoring, including carefully deciding what topics need to be remembered in light of public health goals.
Paper Structure (32 sections, 3 figures, 3 tables)

This paper contains 32 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Architecture of the two different versions of CareCall chatbots, an open-ended dialogue system powered by an LLM called HyperCLOVA Kim2021HyperCLOVA. (a) In the initial version of CareCall without LTM, the system generates a response (Ⓒ) by feeding the current dialogue history (Ⓐ) into the LLM (Ⓑ) that was fine-tuned in advance with an example dialogue corpus that covers five health topics---meals, sleep, health, going out, and physical activity. The user information obtained from previous calls did not affect future calls since this version did not have long-term memory. (b) CareCall with LTM retains user information from the call logs. At the end of each session, a summarizer driven by an LLM (Ⓕ) generates summary sentences that are relevant to the five LTM topics (see below), which are stored and updated by the memory management layer (Ⓓ). The summary sentences are then included in the model input (Ⓔ) so that the underlying LLM (Ⓑ$^\prime$) can take that knowledge into account when generating responses in the following sessions. In this version, the LLM (Ⓑ$^\prime$) was further fine-tuned with an additional example dialogue corpus designed as a multi-session chat in memory-augmented format.
  • Figure 2: Overview of sampling and screening users from municipalities and the final datasets for the [patternparam, background-color=ltmyescolor]$LTM^{yes}$ and [patternparam, background-color=ltmnocolor]$LTM^{no}$ groups.
  • Figure 3: Estimated means and 95% confidence intervals of code counts about Meals, Sleep, Health, Clinical, and Activity by the cumulative number of LTM events in the [patternparam, background-color=ltmyescolor]$LTM^{yes}$ group. The colored lines indicate the estimated means and the shaded areas indicate 95% confidence intervals of the code counts per call for each code. Overall, the repeated experiences of LTM events led to greater disclosure of more detailed information across the five categories.