Table of Contents
Fetching ...

Towards Characterizing Knowledge Distillation of PPG Heart Rate Estimation Models

Kanav Arora, Girish Narayanswamy, Shwetak Patel, Richard Li

TL;DR

The paper tackles the challenge of deploying high-performing PPG heart-rate estimation models on wearable devices by systematically distilling large pre-trained models into smaller, edge-friendly equivalents. It assesses four distillation strategies (hard, soft, DKD, feature) across varying teacher and student capacities using a 1D-ResNet backbone and 180-class BPM classification, evaluated on multiple free-living datasets. Results show that distilled models consistently outperform from-scratch baselines, with DKD providing the strongest gains and a predictable exponential scaling law governing performance as model size changes; ResNet-based students benefit more from distillation than MLPs. Additionally, distillation yields substantial reductions in memory and inference time, supporting practical edge deployment for real-time physiological sensing.

Abstract

Heart rate estimation from photoplethysmography (PPG) signals generated by wearable devices such as smartwatches and fitness trackers has significant implications for the health and well-being of individuals. Although prior work has demonstrated deep learning models with strong performance in the heart rate estimation task, in order to deploy these models on wearable devices, these models must also adhere to strict memory and latency constraints. In this work, we explore and characterize how large pre-trained PPG models may be distilled to smaller models appropriate for real-time inference on the edge. We evaluate four distillation strategies through comprehensive sweeps of teacher and student model capacities: (1) hard distillation, (2) soft distillation, (3) decoupled knowledge distillation (DKD), and (4) feature distillation. We present a characterization of the resulting scaling laws describing the relationship between model size and performance. This early investigation lays the groundwork for practical and predictable methods for building edge-deployable models for physiological sensing.

Towards Characterizing Knowledge Distillation of PPG Heart Rate Estimation Models

TL;DR

The paper tackles the challenge of deploying high-performing PPG heart-rate estimation models on wearable devices by systematically distilling large pre-trained models into smaller, edge-friendly equivalents. It assesses four distillation strategies (hard, soft, DKD, feature) across varying teacher and student capacities using a 1D-ResNet backbone and 180-class BPM classification, evaluated on multiple free-living datasets. Results show that distilled models consistently outperform from-scratch baselines, with DKD providing the strongest gains and a predictable exponential scaling law governing performance as model size changes; ResNet-based students benefit more from distillation than MLPs. Additionally, distillation yields substantial reductions in memory and inference time, supporting practical edge deployment for real-time physiological sensing.

Abstract

Heart rate estimation from photoplethysmography (PPG) signals generated by wearable devices such as smartwatches and fitness trackers has significant implications for the health and well-being of individuals. Although prior work has demonstrated deep learning models with strong performance in the heart rate estimation task, in order to deploy these models on wearable devices, these models must also adhere to strict memory and latency constraints. In this work, we explore and characterize how large pre-trained PPG models may be distilled to smaller models appropriate for real-time inference on the edge. We evaluate four distillation strategies through comprehensive sweeps of teacher and student model capacities: (1) hard distillation, (2) soft distillation, (3) decoupled knowledge distillation (DKD), and (4) feature distillation. We present a characterization of the resulting scaling laws describing the relationship between model size and performance. This early investigation lays the groundwork for practical and predictable methods for building edge-deployable models for physiological sensing.

Paper Structure

This paper contains 4 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1:
  • Figure 3: Soft Distilled Model Scaling Behavior for ResNet and MLP Student Architectures. Performance analysis of student models trained via soft distillation across varying parameter counts. ResNet students (blue) demonstrate superior scaling efficiency and a significantly lower error floor compared to MLP students (orange), indicating a stronger inductive bias for the PPG task.