Table of Contents
Fetching ...

Benchmarking ResNet for Short-Term Hypoglycemia Classification with DiaData

Beyza Cinar, Maria Maleshkova

TL;DR

This work addresses the challenge of reliable hypoglycemia prediction in Type 1 Diabetes by improving the quality of the DiaData CGM collection and benchmarking a state-of-the-art 1D-ResNet model for hypoglycemia onset classification up to 2 hours ahead. It introduces a quality-improvement pipeline (IQR-based outlier masking, targeted imputation with linear and Stineman methods) and a five-class time-to-hypoglycemia labeling scheme, then demonstrates that data quality and quantity jointly boost predictive performance (2–3% gains with quality improvements and ~7% gains with larger data). A correlation analysis between glucose and heart rate reveals moderate associations 15–60 minutes before hypoglycemia, suggesting extra signal from multimodal data. Overall, the findings underscore the practical impact of data cleaning and larger, heterogeneous datasets for more reliable early warning of hypoglycemic events in real-world diabetes management.

Abstract

Individualized therapy is driven forward by medical data analysis, which provides insight into the patient's context. In particular, for Type 1 Diabetes (T1D), which is an autoimmune disease, relationships between demographics, sensor data, and context can be analyzed. However, outliers, noisy data, and small data volumes cannot provide a reliable analysis. Hence, the research domain requires large volumes of high-quality data. Moreover, missing values can lead to information loss. To address this limitation, this study improves the data quality of DiaData, an integration of 15 separate datasets containing glucose values from 2510 subjects with T1D. Notably, we make the following contributions: 1) Outliers are identified with the interquartile range (IQR) approach and treated by replacing them with missing values. 2) Small gaps ($\le$ 25 min) are imputed with linear interpolation and larger gaps ($\ge$ 30 and $<$ 120 min) with Stineman interpolation. Based on a visual comparison, Stineman interpolation provides more realistic glucose estimates than linear interpolation for larger gaps. 3) After data cleaning, the correlation between glucose and heart rate is analyzed, yielding a moderate relation between 15 and 60 minutes before hypoglycemia ($\le$ 70 mg/dL). 4) Finally, a benchmark for hypoglycemia classification is provided with a state-of-the-art ResNet model. The model is trained with the Maindatabase and Subdatabase II of DiaData to classify hypoglycemia onset up to 2 hours in advance. Training with more data improves performance by 7% while using quality-refined data yields a 2-3% gain compared to raw data.

Benchmarking ResNet for Short-Term Hypoglycemia Classification with DiaData

TL;DR

This work addresses the challenge of reliable hypoglycemia prediction in Type 1 Diabetes by improving the quality of the DiaData CGM collection and benchmarking a state-of-the-art 1D-ResNet model for hypoglycemia onset classification up to 2 hours ahead. It introduces a quality-improvement pipeline (IQR-based outlier masking, targeted imputation with linear and Stineman methods) and a five-class time-to-hypoglycemia labeling scheme, then demonstrates that data quality and quantity jointly boost predictive performance (2–3% gains with quality improvements and ~7% gains with larger data). A correlation analysis between glucose and heart rate reveals moderate associations 15–60 minutes before hypoglycemia, suggesting extra signal from multimodal data. Overall, the findings underscore the practical impact of data cleaning and larger, heterogeneous datasets for more reliable early warning of hypoglycemic events in real-world diabetes management.

Abstract

Individualized therapy is driven forward by medical data analysis, which provides insight into the patient's context. In particular, for Type 1 Diabetes (T1D), which is an autoimmune disease, relationships between demographics, sensor data, and context can be analyzed. However, outliers, noisy data, and small data volumes cannot provide a reliable analysis. Hence, the research domain requires large volumes of high-quality data. Moreover, missing values can lead to information loss. To address this limitation, this study improves the data quality of DiaData, an integration of 15 separate datasets containing glucose values from 2510 subjects with T1D. Notably, we make the following contributions: 1) Outliers are identified with the interquartile range (IQR) approach and treated by replacing them with missing values. 2) Small gaps ( 25 min) are imputed with linear interpolation and larger gaps ( 30 and 120 min) with Stineman interpolation. Based on a visual comparison, Stineman interpolation provides more realistic glucose estimates than linear interpolation for larger gaps. 3) After data cleaning, the correlation between glucose and heart rate is analyzed, yielding a moderate relation between 15 and 60 minutes before hypoglycemia ( 70 mg/dL). 4) Finally, a benchmark for hypoglycemia classification is provided with a state-of-the-art ResNet model. The model is trained with the Maindatabase and Subdatabase II of DiaData to classify hypoglycemia onset up to 2 hours in advance. Training with more data improves performance by 7% while using quality-refined data yields a 2-3% gain compared to raw data.

Paper Structure

This paper contains 12 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Missingness in Glucose Levels Before Data Imputation
  • Figure 2: Missingness in Glucose Levels After Data Imputation
  • Figure 3: Raw vs. Imputed Data for Subject 190.0 RT-CGM.
  • Figure 4: Confusion Matrices