Clustering of timed sequences -- Application to the analysis of care pathways
Thomas Guyet, Pierre Pinson, Enoal Gesny
TL;DR
The paper addresses clustering of timed sequences representing care pathways by introducing a drop-DTW-based metric tailored for events with timestamps and probabilistic event-type embeddings. It extends dynamic time warping to allow deletions (drops) and defines a DBA-inspired averaging procedure to compute representative timed sequences, with convergence guarantees. The approach is validated on synthetic data and applied to real-world electronic health records from the OPTISOINS project, showing that drop-DTW-based clustering can yield clinically informative care-pathway patterns and can outperform TraMineR in identifying meaningful clusters. While promising, the method involves many parameters and substantial computation, motivating future work on parameter guidance, scalable implementations, and clinical validation of the derived average pathways.
Abstract
Improving the future of healthcare starts by better understanding the current actual practices in hospital settings. This motivates the objective of discovering typical care pathways from patient data. Revealing typical care pathways can be achieved through clustering. The difficulty in clustering care pathways, represented by sequences of timestamped events, lies in defining a semantically appropriate metric and clustering algorithms. In this article, we adapt two methods developed for time series to the clustering of timed sequences: the drop-DTW metric and the DBA approach for the construction of averaged time sequences. These methods are then applied in clustering algorithms to propose original and sound clustering algorithms for timed sequences. This approach is experimented with and evaluated on synthetic and real-world data.
