ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction

Hao Liu; Yu Hu; Rakiba Rayhana; Ling Bai; Zheng Liu

ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction

Hao Liu, Yu Hu, Rakiba Rayhana, Ling Bai, Zheng Liu

TL;DR

This paper tackles the challenge of early bed-exit prediction using a single, privacy-friendly load cell under a bed leg. It introduces ViFusionTST, a dual-stream Swin Transformer that processes a line-plot image and a three-channel texture-map image (RP, MTF, GAF) via cross-attention fusion to learn modality weights. Leveraging six months of real-world data from 95 beds, the method achieves an accuracy of 0.885 and an F1 score of 0.794, outperforming strong 1D and 2D time-series baselines on multiple metrics. The work demonstrates that image-based time-series fusion on low-cost sensors can deliver real-time, reliable bed-exit predictions with privacy and scalability advantages, contributing a practical path toward reducing hospital falls.

Abstract

Bed-related falls remain a major source of injury in hospitals and long-term care facilities, yet many commercial alarms trigger only after a patient has already left the bed. We show that early bed-exit intent can be predicted using only one low-cost load cell mounted under a bed leg. The resulting load signals are first converted into a compact set of complementary images: an RGB line plot that preserves raw waveforms and three texture maps-recurrence plot, Markov transition field, and Gramian angular field-that expose higher-order dynamics. We introduce ViFusionTST, a dual-stream Swin Transformer that processes the line plot and texture maps in parallel and fuses them through cross-attention to learn data-driven modality weights. To provide a realistic benchmark, we collected six months of continuous data from 95 beds in a long-term-care facility. On this real-world dataset ViFusionTST reaches an accuracy of 0.885 and an F1 score of 0.794, surpassing recent 1D and 2D time-series baselines across F1, recall, accuracy, and AUPRC. The results demonstrate that image-based fusion of load-sensor signals for time series classification is a practical and effective solution for real-time, privacy-preserving fall prevention.

ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction

TL;DR

Abstract

ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)