Table of Contents
Fetching ...

Representation Learning of Daily Movement Data Using Text Encoders

Alexander Capstick, Tianyu Cui, Yu Chen, Payam Barnaghi

TL;DR

This work tackles learning meaningful representations from irregular, discrete-valued in-home activity time-series for people with Dementia. The authors convert each day into a text string and fine-tune a sentence-embedding model (SE-MiniLM) with a triplet loss over a $30$-day window to produce personalized day embeddings. These embeddings enable clustering into distinct activity patterns, vector search for similar days, and monitoring of behavior changes, with initial evidence from UTI-label analyses and visualizations (e.g., 5 clusters identified by k-means). By leveraging pre-trained language-model representations and semantic similarity in vector space, the approach supports clinically relevant retrieval and change detection to inform personalised care delivery.

Abstract

Time-series representation learning is a key area of research for remote healthcare monitoring applications. In this work, we focus on a dataset of recordings of in-home activity from people living with Dementia. We design a representation learning method based on converting activity to text strings that can be encoded using a language model fine-tuned to transform data from the same participants within a $30$-day window to similar embeddings in the vector space. This allows for clustering and vector searching over participants and days, and the identification of activity deviations to aid with personalised delivery of care.

Representation Learning of Daily Movement Data Using Text Encoders

TL;DR

This work tackles learning meaningful representations from irregular, discrete-valued in-home activity time-series for people with Dementia. The authors convert each day into a text string and fine-tune a sentence-embedding model (SE-MiniLM) with a triplet loss over a -day window to produce personalized day embeddings. These embeddings enable clustering into distinct activity patterns, vector search for similar days, and monitoring of behavior changes, with initial evidence from UTI-label analyses and visualizations (e.g., 5 clusters identified by k-means). By leveraging pre-trained language-model representations and semantic similarity in vector space, the approach supports clinically relevant retrieval and change detection to inform personalised care delivery.

Abstract

Time-series representation learning is a key area of research for remote healthcare monitoring applications. In this work, we focus on a dataset of recordings of in-home activity from people living with Dementia. We design a representation learning method based on converting activity to text strings that can be encoded using a language model fine-tuned to transform data from the same participants within a -day window to similar embeddings in the vector space. This allows for clustering and vector searching over participants and days, and the identification of activity deviations to aid with personalised delivery of care.
Paper Structure (19 sections, 10 figures, 1 table)

This paper contains 19 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: Preprocessing of data. For a single participant, we present their data collected over a single day. The left graph shows the raw measurements; the central graph shows the measurements after the mode of each 20 minute window is taken and a token for no activity is assigned ("Nowhere"); and the right graph shows the day as a single text string that can be interpreted by a language model.
  • Figure 2: t-SNE of embeddings. The left plot shows the t-SNE transformation of 5000.0 day-string embeddings, coloured by their cluster value. On the upper row of the right hand figures, we show the t-SNE embeddings for the $4$ participants with the most data, where a line indicates consecutive days. On the lower row, we show the daily cluster values in time for the same participants.
  • Figure 3: Day similarity. Each image shows the cosine similarity between every $20$th recorded day for $10$ participants. Blue, white, and red correspond to a cosine similarity of $1$, $0$, and $-1$ respectively. The range of the days are represented by each axis of the plots and is given by $n$.
  • Figure 4: Number of days recorded. Histogram showing the number of participants with a given number of days of data recorded.
  • Figure 5: Time of location recordings. Histogram showing the number of sensor recordings by time of day.
  • ...and 5 more figures