Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning

Chun-Kai Huang; Yi-Hsien Hsieh; Ta-Jung Chien; Li-Cheng Chien; Shao-Hua Sun; Tung-Hung Su; Jia-Horng Kao; Che Lin

Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning

Chun-Kai Huang, Yi-Hsien Hsieh, Ta-Jung Chien, Li-Cheng Chien, Shao-Hua Sun, Tung-Hung Su, Jia-Horng Kao, Che Lin

TL;DR

This work tackles irregular, asynchronously sampled multivariate time series with pervasive missing values in healthcare data. It introduces SCANE, which treats each observed value as a token by combining feature-type embeddings with a scaling by the observed value, enabling an imputation-free representation: $\mathrm{SCANE}(x'_{i,j}, m_{i,j}) = (x'_{i,j} \cdot m_{i,j}) \boldsymbol{u}_j$. Coupling SCANE with a Transformer Encoder yields SUMMIT, a scalable, imputation-free classifier that uses masking to ignore missing entries and a revised rollout attention for interpretability. Across three EHR datasets with high missingness, SUMMIT achieves state-of-the-art AUPRC and provides clinically aligned insights, indicating broad applicability to MTS analysis beyond healthcare.

Abstract

Multivariate time series (MTS) data, when sampled irregularly and asynchronously, often present extensive missing values. Conventional methodologies for MTS analysis tend to rely on temporal embeddings based on timestamps that necessitate subsequent imputations, yet these imputed values frequently deviate substantially from their actual counterparts, thereby compromising prediction accuracy. Furthermore, these methods typically fail to provide robust initial embeddings for values infrequently observed or even absent within the training set, posing significant challenges to model generalizability. In response to these challenges, we propose SCAlable Numerical Embedding (SCANE), a novel framework that treats each feature value as an independent token, effectively bypassing the need for imputation. SCANE regularizes the traits of distinct feature embeddings and enhances representational learning through a scalable embedding mechanism. Coupling SCANE with the Transformer Encoder architecture, we develop the Scalable nUMerical eMbeddIng Transformer (SUMMIT), which is engineered to deliver precise predictive outputs for MTS characterized by prevalent missing entries. Our experimental validation, conducted across three disparate electronic health record (EHR) datasets marked by elevated missing value frequencies, confirms the superior performance of SUMMIT over contemporary state-of-the-art approaches addressing similar challenges. These results substantiate the efficacy of SCANE and SUMMIT, underscoring their potential applicability across a broad spectrum of MTS data analytical tasks.

Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning

TL;DR

. Coupling SCANE with a Transformer Encoder yields SUMMIT, a scalable, imputation-free classifier that uses masking to ignore missing entries and a revised rollout attention for interpretability. Across three EHR datasets with high missingness, SUMMIT achieves state-of-the-art AUPRC and provides clinically aligned insights, indicating broad applicability to MTS analysis beyond healthcare.

Abstract

Paper Structure (27 sections, 9 equations, 5 figures, 5 tables)

This paper contains 27 sections, 9 equations, 5 figures, 5 tables.

Introduction
Related Work
Methodology
Scalable Numerical Embedding
Transfomrer Encoder with Scalable Numerical Embedding
Revised Rollout Attention
Experiment and Result
Datasets
MIMIC-III (MI3)
PhysioNet2012 (P12)
Hepatocellular Carcinoma Dataset (HCC)
Models
Experimental Settings
Metrics
Results and Discussion
...and 12 more sections

Figures (5)

Figure 1: Embedding Multivariate Time Series Values. The figure illustrates irregularly and asynchronously sampled MTS data with three variables ($x_1$-$x_3$) and five timestamps ($t_1$-$t_5$). The x marks represent missing values, and colored dots are observations. "Each Value as A Token (EVAT)" only embeds observations and bypasses missing values.
Figure 2: Scalable Numerical Embedding: We take the value "age 41.82 years old" as an example to explain the process of SCANE. The process of SCANE can be divided into two steps: mapping and scaling. First, the "age" feature type indicator is mapped to the feature embedding $\boldsymbol{u}_j$ via a learnable function $f$. Second, we scale this feature embedding $\boldsymbol{u}_j$ according to its observed value. When the value is missing, the value will be assigned to a zero vector.
Figure 3: Attention Weights Visualization. The number in each cell ranks the corresponding attention weights: the smaller, the higher. (a) Rollout Attention on the Attention Weights from the First Two Stacks of SCANE. (b) Revised Rollout Attention on the Attention Weights from the First Two Stacks of SCANE. (c) Revised Rollout Attention on all SUMMIT's Attention Stacks. We rearrange the columns based on each feature’s mean rank across the timestamps.
Figure 4: Feature Embedding Visulization: We visualize the feature embedding from HCC in this plot. The purple dots represent the hepatocellular-carcinoma-related features suggested by our partner medical experts. For the full feature names, please refer to Appendix \ref{['table:ntuh_feature']}.
Figure 5: EVAT Implementation Comparison: The setting in this ablation study follows \ref{['sec:setup']}. Intuitively, other than metric performance, the SCANE is also the most parameter-efficient implementation than other naive EVAT implementations.

Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning

TL;DR

Abstract

Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)