Clustering of timed sequences -- Application to the analysis of care pathways

Thomas Guyet; Pierre Pinson; Enoal Gesny

Clustering of timed sequences -- Application to the analysis of care pathways

Thomas Guyet, Pierre Pinson, Enoal Gesny

TL;DR

The paper addresses clustering of timed sequences representing care pathways by introducing a drop-DTW-based metric tailored for events with timestamps and probabilistic event-type embeddings. It extends dynamic time warping to allow deletions (drops) and defines a DBA-inspired averaging procedure to compute representative timed sequences, with convergence guarantees. The approach is validated on synthetic data and applied to real-world electronic health records from the OPTISOINS project, showing that drop-DTW-based clustering can yield clinically informative care-pathway patterns and can outperform TraMineR in identifying meaningful clusters. While promising, the method involves many parameters and substantial computation, motivating future work on parameter guidance, scalable implementations, and clinical validation of the derived average pathways.

Abstract

Improving the future of healthcare starts by better understanding the current actual practices in hospital settings. This motivates the objective of discovering typical care pathways from patient data. Revealing typical care pathways can be achieved through clustering. The difficulty in clustering care pathways, represented by sequences of timestamped events, lies in defining a semantically appropriate metric and clustering algorithms. In this article, we adapt two methods developed for time series to the clustering of timed sequences: the drop-DTW metric and the DBA approach for the construction of averaged time sequences. These methods are then applied in clustering algorithms to propose original and sound clustering algorithms for timed sequences. This approach is experimented with and evaluated on synthetic and real-world data.

Clustering of timed sequences -- Application to the analysis of care pathways

TL;DR

Abstract

Paper Structure (18 sections, 14 equations, 4 figures, 2 tables, 2 algorithms)

This paper contains 18 sections, 14 equations, 4 figures, 2 tables, 2 algorithms.

Introduction
Times Sequences and Probabilistic Timed Sequences
Comparing Timed Sequences
DTW and drop-DTW
Adaptation of drop-DTW for timed sequences
Construction of an Average Timed Sequence
Average Timed Sequences
Convergence of the Algorithm
Parameters
Clustering of Timed Sequences
Experiments and Results on Synthetic Data
Effects of the $\frac{p_t}{p_e}$ Ratio
Effect of Drop-Cost
Application to Real-World Care Pathways
Comparison with TraMineR
...and 3 more sections

Figures (4)

Figure 1: Illustration of the timed sequences of Example \ref{['ex:ex1']} (on the left) and the cost matrix for the alignment between $s_1$ and $s_2$ according to the drop-DTW (on the right). The alignment is illustrated by the grey bars on the left and by the colored cells in the matrix (on the right). The red-crosses illustrate that $P$ and $S$ has been dropped in this alignment, the cells at the corner are dark because they to no satisfy the constraint $\tau=3.5$. Clinical pathway interpretation: $S$ stands for surgery, $C$ consultation, $P$ physiotherapist session and $R$ radiotherapy session.
Figure 2: Illustration of one averaging step. At the top, the alignment of the current average $s_r$ is computed with each sequence. Then, the "vertical" average of the set of events paired to one $s_r$ event yields an average probabilistic event in $s'_r$ (at the bottom).
Figure 3: Clusters obtained by the clustering methods, from left to right: HierAsTiSeq, K-means and TraMineR. Clusters are ordered row-wise by their size. Each cluster is represented by the histogram of event types over time. Each color corresponds to a specific type of event (see Figure \ref{['fig:5cluster_K-means_clustiseq']} for color legend). The higher the bar, the more events there are at the given time across all patients. The vertical red line indicates the index-date of the resection (0 delay).
Figure 4: Clusters of care-pathways identified by K-means with drop-DTW on $3\,311$ patients. Each row is a cluster with the representation of the barycenter on the left and the representation of the events distribution over time on the right. Each color corresponds to a specific type of event.

Theorems & Definitions (4)

Example 1: Probabilistic representation of a timed sequence
Example 2: Drops representation in drop-DTW
Example 3: Illustration of drop-cost usefulness
Example 4: Iteration of the averaging algorithm

Clustering of timed sequences -- Application to the analysis of care pathways

TL;DR

Abstract

Clustering of timed sequences -- Application to the analysis of care pathways

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (4)