Table of Contents
Fetching ...

Camera Measurement of Blood Oxygen Saturation

Jiankai Tang, Xin Liu, Daniel McDuff, Zhang Jiang, Hongming Hu, Luxi Zhou, Nodoka Nagao, Haruta Suzuki, Yuki Nagahama, Wei Li, Linhong Ji, Yuanchun Shi, Izumi Nishidate, Yuntao Wang

TL;DR

This work tackles non-invasive $SpO_2$ estimation from video by introducing a deep learning framework that fuses facial video frames with color-calibration cues. The VC2S model, combined with careful calibration strategies and ROI-based preprocessing, achieves robust intra- and inter-dataset performance across diverse skin tones, ages, and environments, outperforming traditional signal-processing baselines. Key contributions include comprehensive ablation analyses, demonstration of the importance of calibration (including a color checker) for generalization, and a detailed methodology for dataset collection and processing. The approach holds promise for scalable remote health monitoring, while highlighting the ongoing challenge of calibration requirements for real-world deployment across populations and devices.

Abstract

Blood oxygen saturation (SpO2) is a crucial vital sign routinely monitored in medical settings. Traditional methods require dedicated contact sensors, limiting accessibility and comfort. This study presents a deep learning framework for contactless SpO2 measurement using an off-the-shelf camera, addressing challenges related to lighting variations and skin tone diversity. We conducted two large-scale studies with diverse participants and evaluated our method against traditional signal processing approaches in intra- and inter-dataset scenarios. Our approach demonstrated consistent accuracy across demographic groups, highlighting the feasibility of camera-based SpO2 monitoring as a scalable and non-invasive tool for remote health assessment.

Camera Measurement of Blood Oxygen Saturation

TL;DR

This work tackles non-invasive estimation from video by introducing a deep learning framework that fuses facial video frames with color-calibration cues. The VC2S model, combined with careful calibration strategies and ROI-based preprocessing, achieves robust intra- and inter-dataset performance across diverse skin tones, ages, and environments, outperforming traditional signal-processing baselines. Key contributions include comprehensive ablation analyses, demonstration of the importance of calibration (including a color checker) for generalization, and a detailed methodology for dataset collection and processing. The approach holds promise for scalable remote health monitoring, while highlighting the ongoing challenge of calibration requirements for real-world deployment across populations and devices.

Abstract

Blood oxygen saturation (SpO2) is a crucial vital sign routinely monitored in medical settings. Traditional methods require dedicated contact sensors, limiting accessibility and comfort. This study presents a deep learning framework for contactless SpO2 measurement using an off-the-shelf camera, addressing challenges related to lighting variations and skin tone diversity. We conducted two large-scale studies with diverse participants and evaluated our method against traditional signal processing approaches in intra- and inter-dataset scenarios. Our approach demonstrated consistent accuracy across demographic groups, highlighting the feasibility of camera-based SpO2 monitoring as a scalable and non-invasive tool for remote health assessment.

Paper Structure

This paper contains 24 sections, 16 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Principle of Camera-based SpO$_2$ Measurement. The camera captures the skin tissue, which is illuminated by a light source. Beneath the skin tissue, blood vessels contain oxygenated and deoxygenated hemoglobin. The RGB values in each pixel can be transformed into the concentrations of oxygenated hemoglobin (HbO) and the deoxygenated hemoglobin (HbR) using a skin tissue model nishidate2022rgb. Then the tissue oxygen saturation (StO$_2$) and the percutaneous arterial oxygen saturation (SpO$_2$) can be calculated from the HbO and HbR concentrations.
  • Figure 2: Examples of Predicted SpO2 and Ground Truth SpO2 across three datasets and two evaluation setups. A) Intra-dataset evaluation and Inter-dataset evaluation on the THU dataset. B) Intra-dataset evaluation and Inter-dataset evaluation on the TUAT V1 dataset. C) Intra-dataset evaluation and Inter-dataset evaluation on the TUAT V2 dataset.
  • Figure 3: Study Apparatus. A) Diagram of study apparatus design. B) Example from the TUAT datasets. C) Example from the THU dataset.
  • Figure 4: Preprocessing and Evaluation Pipeline. A) The preprocessing pipeline includes region of interest segmentation, tracking, and resize. B) The Leave-One-Out-Cross-Validation (LOOCV) evaluation pipeline for intra-dataset testing. C) The cross-dataset evaluation pipeline for inter-dataset testing.
  • Figure 5: VC2S Neural Model. The model consists of two input branches: a single frame ROI and its corresponding color check. It outputs a SpO$_2$ value for the frame. The model architecture includes 2D convolutional layers with ReLU activation and max pooling, followed by adaptive pooling and a fully connected flattening layer.
  • ...and 2 more figures