Table of Contents
Fetching ...

MemPal: Leveraging Multimodal AI and LLMs for Voice-Activated Object Retrieval in Homes of Older Adults

Natasha Maniar, Samantha W. T. Chan, Wazeer Zulfikar, Scott Ren, Christine Xu, Pattie Maes

TL;DR

MemPal addresses the memory challenges of older adults by combining a wearable egocentric camera, a voice-based LLM interface, and a vision-language system to support retrospective object retrieval via natural conversation. The approach yields improved object-finding performance over baseline and comparable results to visual aids, while also enabling an activity diary for context-based queries and potential safety reminders. User feedback indicates overall usefulness and acceptable usability, though comfort, accuracy, and onboarding require refinement for broader adoption. The work demonstrates the feasibility of a multimodal memory assistant that preserves privacy by storing textual rather than image data and points to future directions for personalized, proactive, and privacy-preserving memory support in home environments.

Abstract

Older adults have increasing difficulty with retrospective memory, hindering their abilities to perform daily activities and posing stress on caregivers to ensure their wellbeing. Recent developments in Artificial Intelligence (AI) and large context-aware multimodal models offer an opportunity to create memory support systems that assist older adults with common issues like object finding. This paper discusses the development of an AI-based, wearable memory assistant, MemPal, that helps older adults with a common problem, finding lost objects at home, and presents results from tests of the system in older adults' own homes. Using visual context from a wearable camera, the multimodal LLM system creates a real-time automated text diary of the person's activities for memory support purposes, offering object retrieval assistance using a voice-based interface. The system is designed to support additional use cases like context-based proactive safety reminders and recall of past actions. We report on a quantitative and qualitative study with N=15 older adults within their own homes that showed improved performance of object finding with audio-based assistance compared to no aid and positive overall user perceptions on the designed system. We discuss further applications of MemPal's design as a multi-purpose memory aid and future design guidelines to adapt memory assistants to older adults' unique needs.

MemPal: Leveraging Multimodal AI and LLMs for Voice-Activated Object Retrieval in Homes of Older Adults

TL;DR

MemPal addresses the memory challenges of older adults by combining a wearable egocentric camera, a voice-based LLM interface, and a vision-language system to support retrospective object retrieval via natural conversation. The approach yields improved object-finding performance over baseline and comparable results to visual aids, while also enabling an activity diary for context-based queries and potential safety reminders. User feedback indicates overall usefulness and acceptable usability, though comfort, accuracy, and onboarding require refinement for broader adoption. The work demonstrates the feasibility of a multimodal memory assistant that preserves privacy by storing textual rather than image data and points to future directions for personalized, proactive, and privacy-preserving memory support in home environments.

Abstract

Older adults have increasing difficulty with retrospective memory, hindering their abilities to perform daily activities and posing stress on caregivers to ensure their wellbeing. Recent developments in Artificial Intelligence (AI) and large context-aware multimodal models offer an opportunity to create memory support systems that assist older adults with common issues like object finding. This paper discusses the development of an AI-based, wearable memory assistant, MemPal, that helps older adults with a common problem, finding lost objects at home, and presents results from tests of the system in older adults' own homes. Using visual context from a wearable camera, the multimodal LLM system creates a real-time automated text diary of the person's activities for memory support purposes, offering object retrieval assistance using a voice-based interface. The system is designed to support additional use cases like context-based proactive safety reminders and recall of past actions. We report on a quantitative and qualitative study with N=15 older adults within their own homes that showed improved performance of object finding with audio-based assistance compared to no aid and positive overall user perceptions on the designed system. We discuss further applications of MemPal's design as a multi-purpose memory aid and future design guidelines to adapt memory assistants to older adults' unique needs.

Paper Structure

This paper contains 73 sections, 16 figures, 5 tables.

Figures (16)

  • Figure 1: System onboarding phase: Using the MemPal app, the user first watches an instructional demo video before beginning the home tour video walk-through. Once the video is processed, the verbally labeled locations are populated in the app.
  • Figure 2: System implementation overview highlights the use of real-time visual context through a wearable camera for a question-answering language system using wearable audio I/O. The visual context from the camera consists of location tracking, object tracking, and activity and scene recognition to create a virtual diary (Daily Diary DB) which is used later for querying.
  • Figure 3: An example of activity log, query log, and higher level activities extracted (from Participant 5 of user study)
  • Figure 4: User flow for object retrieval. When the user places an object at a location in the house, MemPal stores this information which can be later retrieved during QA. The user can ask the location of the specified object as well as followup questions.
  • Figure 5: Object Retrieval Implementation: This workflow demonstrates the question and answer system specifically for object retrieval which uses the visual context as determined in Figure \ref{['fig:visual']} to respond to user queries starting with query categorization.
  • ...and 11 more figures