Table of Contents
Fetching ...

ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction

Hao Liu, Yu Hu, Rakiba Rayhana, Ling Bai, Zheng Liu

TL;DR

This paper tackles the challenge of early bed-exit prediction using a single, privacy-friendly load cell under a bed leg. It introduces ViFusionTST, a dual-stream Swin Transformer that processes a line-plot image and a three-channel texture-map image (RP, MTF, GAF) via cross-attention fusion to learn modality weights. Leveraging six months of real-world data from 95 beds, the method achieves an accuracy of 0.885 and an F1 score of 0.794, outperforming strong 1D and 2D time-series baselines on multiple metrics. The work demonstrates that image-based time-series fusion on low-cost sensors can deliver real-time, reliable bed-exit predictions with privacy and scalability advantages, contributing a practical path toward reducing hospital falls.

Abstract

Bed-related falls remain a major source of injury in hospitals and long-term care facilities, yet many commercial alarms trigger only after a patient has already left the bed. We show that early bed-exit intent can be predicted using only one low-cost load cell mounted under a bed leg. The resulting load signals are first converted into a compact set of complementary images: an RGB line plot that preserves raw waveforms and three texture maps-recurrence plot, Markov transition field, and Gramian angular field-that expose higher-order dynamics. We introduce ViFusionTST, a dual-stream Swin Transformer that processes the line plot and texture maps in parallel and fuses them through cross-attention to learn data-driven modality weights. To provide a realistic benchmark, we collected six months of continuous data from 95 beds in a long-term-care facility. On this real-world dataset ViFusionTST reaches an accuracy of 0.885 and an F1 score of 0.794, surpassing recent 1D and 2D time-series baselines across F1, recall, accuracy, and AUPRC. The results demonstrate that image-based fusion of load-sensor signals for time series classification is a practical and effective solution for real-time, privacy-preserving fall prevention.

ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction

TL;DR

This paper tackles the challenge of early bed-exit prediction using a single, privacy-friendly load cell under a bed leg. It introduces ViFusionTST, a dual-stream Swin Transformer that processes a line-plot image and a three-channel texture-map image (RP, MTF, GAF) via cross-attention fusion to learn modality weights. Leveraging six months of real-world data from 95 beds, the method achieves an accuracy of 0.885 and an F1 score of 0.794, outperforming strong 1D and 2D time-series baselines on multiple metrics. The work demonstrates that image-based time-series fusion on low-cost sensors can deliver real-time, reliable bed-exit predictions with privacy and scalability advantages, contributing a practical path toward reducing hospital falls.

Abstract

Bed-related falls remain a major source of injury in hospitals and long-term care facilities, yet many commercial alarms trigger only after a patient has already left the bed. We show that early bed-exit intent can be predicted using only one low-cost load cell mounted under a bed leg. The resulting load signals are first converted into a compact set of complementary images: an RGB line plot that preserves raw waveforms and three texture maps-recurrence plot, Markov transition field, and Gramian angular field-that expose higher-order dynamics. We introduce ViFusionTST, a dual-stream Swin Transformer that processes the line plot and texture maps in parallel and fuses them through cross-attention to learn data-driven modality weights. To provide a realistic benchmark, we collected six months of continuous data from 95 beds in a long-term-care facility. On this real-world dataset ViFusionTST reaches an accuracy of 0.885 and an F1 score of 0.794, surpassing recent 1D and 2D time-series baselines across F1, recall, accuracy, and AUPRC. The results demonstrate that image-based fusion of load-sensor signals for time series classification is a practical and effective solution for real-time, privacy-preserving fall prevention.

Paper Structure

This paper contains 25 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of the strain gauge-based bed monitoring platform.
  • Figure 2: Sensor signals capturing the in-bed period and bed-exit transition.
  • Figure 3: ViFusionTST model architecture.
  • Figure 4: ViFusionTST prediction probability over time for a bed-exit event.