Table of Contents
Fetching ...

Deep Learning for Metabolic Rate Estimation from Biosignals: A Comparative Study of Architectures and Signal Selection

Sarvenaz Babakhani, David Remy, Alina Roitberg

TL;DR

Energy expenditure estimation from wearable biosignals is advanced by a systematic comparison of ML and DL architectures across single, paired, and grouped signals on the Ingraham dataset. The Transformer achieves the lowest RMSE ($0.87 W/kg$) on minute-ventilation input, while minute ventilation remains the strongest single predictor across activities; signal fusion (Local+Global, Hexoskin) and careful pairings further reduce error but reveal strong inter-subject variability. The work provides practical guidelines for model and input choice in real-world EE estimation and releases code to support reproducibility. Overall, the study highlights that architecture choice and signal selection jointly determine generalization across activities and subjects.

Abstract

Energy expenditure estimation aims to infer human metabolic rate from physiological signals such as heart rate, respiration, or accelerometer data, and has been studied primarily with classical regression methods. The few existing deep learning approaches rarely disentangle the role of neural architecture from that of signal choice. In this work, we systematically evaluate both aspects. We compare classical baselines with newer neural architectures across single signals, signal pairs, and grouped sensor inputs for diverse physical activities. Our results show that minute ventilation is the most predictive individual signal, with a transformer model achieving the lowest root mean square error (RMSE) of 0.87 W/kg across all activities. Paired and grouped signals, such as those from the Hexoskin smart shirt (five signals), offer good alternatives for faster models like CNN and ResNet with attention. Per-activity evaluation revealed mixed outcomes: notably better results in low-intensity activities (RMSE down to 0.29 W/kg; NRMSE = 0.04), while higher-intensity tasks showed larger RMSE but more comparable normalized errors. Finally, subject-level analysis highlights strong inter-individual variability, motivating the need for adaptive modeling strategies. Our code and models will be publicly available at https://github.com/Sarvibabakhani/deeplearning-biosignals-ee .

Deep Learning for Metabolic Rate Estimation from Biosignals: A Comparative Study of Architectures and Signal Selection

TL;DR

Energy expenditure estimation from wearable biosignals is advanced by a systematic comparison of ML and DL architectures across single, paired, and grouped signals on the Ingraham dataset. The Transformer achieves the lowest RMSE () on minute-ventilation input, while minute ventilation remains the strongest single predictor across activities; signal fusion (Local+Global, Hexoskin) and careful pairings further reduce error but reveal strong inter-subject variability. The work provides practical guidelines for model and input choice in real-world EE estimation and releases code to support reproducibility. Overall, the study highlights that architecture choice and signal selection jointly determine generalization across activities and subjects.

Abstract

Energy expenditure estimation aims to infer human metabolic rate from physiological signals such as heart rate, respiration, or accelerometer data, and has been studied primarily with classical regression methods. The few existing deep learning approaches rarely disentangle the role of neural architecture from that of signal choice. In this work, we systematically evaluate both aspects. We compare classical baselines with newer neural architectures across single signals, signal pairs, and grouped sensor inputs for diverse physical activities. Our results show that minute ventilation is the most predictive individual signal, with a transformer model achieving the lowest root mean square error (RMSE) of 0.87 W/kg across all activities. Paired and grouped signals, such as those from the Hexoskin smart shirt (five signals), offer good alternatives for faster models like CNN and ResNet with attention. Per-activity evaluation revealed mixed outcomes: notably better results in low-intensity activities (RMSE down to 0.29 W/kg; NRMSE = 0.04), while higher-intensity tasks showed larger RMSE but more comparable normalized errors. Finally, subject-level analysis highlights strong inter-individual variability, motivating the need for adaptive modeling strategies. Our code and models will be publicly available at https://github.com/Sarvibabakhani/deeplearning-biosignals-ee .

Paper Structure

This paper contains 22 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Multimodal physiological signal processing pipeline for EE. Wearable sensors placed across the body collect multimodal signals. These signals are processed and fed as input into multiple neural network architectures. (Image of sensor placement on the body is adapted from Ingraham et alK.A.Ingraham).
  • Figure 2: Heatmap of RMSE values using Minute Ventilation (MV) alone and in combination with secondary signals (rows). The columns correspond to different prediction models. Lower RMSE values (lighter colors) indicate better predictive performance.
  • Figure 3: Model performance across different activities and conditions. The x-axis shows six activities with variations in speed or resistance. Model types are distinguished by color, while input type (single and grouped) is indicated by marker shape.
  • Figure 4: Model performance using alternative input signals for Minute Ventilation. The left panel shows results from paired signal combinations, while the right panel shows single-signal inputs. Different models are represented by distinct colors and markers.
  • Figure 5: Performance of Transformer and CNN for single and grouped signals across 10 subjects. The dashed line separates single- from grouped-signals in each plot. Boxplots represent the distribution of RMSE values across subjects: median (line), 25th–75th percentiles (box), and whiskers to 1.5×IQR. (MV: Minute Ventilation)
  • ...and 1 more figures