Table of Contents
Fetching ...

On the Bias, Fairness, and Bias Mitigation for a Wearable-based Freezing of Gait Detection in Parkinson's Disease

Timothy Odonga, Christine D. Esper, Stewart A. Factor, J. Lucas McKay, Hyeokhyen Kwon

TL;DR

This work evaluates bias and fairness in wearable-based FOG detection for PD across multiple datasets and demographics, revealing persistent disparities (DPR and EOR < 0.8) when no mitigation is used. Conventional fairness methods offer limited and inconsistent improvements, whereas transfer learning from multi-site data and generic human activity representations consistently enhance both performance (F1) and fairness metrics across attributes. The findings underscore the value of diverse, multi-site pretraining to obtain fairer, more generalizable FOG detectors and highlight the need for broader fairness evaluations, including intersectionality and clinically informed fairness thresholds. Overall, multi-site and foundation-like representations emerge as promising strategies to deploy HAR-based health analytics more equitably in real-world PD care.

Abstract

Freezing of gait (FOG) is a debilitating feature of Parkinson's disease (PD), which is a cause of injurious falls among PD patients. Recent advances in wearable-based human activity recognition (HAR) technology have enabled the detection of FOG subtypes across benchmark datasets. Since FOG manifestation is heterogeneous, developing models that quantify FOG consistently across patients with varying demographics, FOG types, and PD conditions is important. Bias and fairness in FOG models remain understudied in HAR, with research focused mainly on FOG detection using single benchmark datasets. We evaluated the bias and fairness of HAR models for wearable-based FOG detection across demographics and PD conditions using multiple datasets and the effectiveness of transfer learning as a potential bias mitigation approach. Our evaluation using demographic parity ratio (DPR) and equalized odds ratio (EOR) showed model bias (DPR & EOR < 0.8) for all stratified demographic variables, including age, sex, and disease duration. Our experiments demonstrated that transfer learning from multi-site datasets and generic human activity representations significantly improved fairness (average change in DPR +0.027, +0.039, respectively) and performance (average change in F1-score +0.026, +0.018, respectively) across attributes, supporting the hypothesis that generic human activity representations learn fairer representations applicable to health analytics.

On the Bias, Fairness, and Bias Mitigation for a Wearable-based Freezing of Gait Detection in Parkinson's Disease

TL;DR

This work evaluates bias and fairness in wearable-based FOG detection for PD across multiple datasets and demographics, revealing persistent disparities (DPR and EOR < 0.8) when no mitigation is used. Conventional fairness methods offer limited and inconsistent improvements, whereas transfer learning from multi-site data and generic human activity representations consistently enhance both performance (F1) and fairness metrics across attributes. The findings underscore the value of diverse, multi-site pretraining to obtain fairer, more generalizable FOG detectors and highlight the need for broader fairness evaluations, including intersectionality and clinically informed fairness thresholds. Overall, multi-site and foundation-like representations emerge as promising strategies to deploy HAR-based health analytics more equitably in real-world PD care.

Abstract

Freezing of gait (FOG) is a debilitating feature of Parkinson's disease (PD), which is a cause of injurious falls among PD patients. Recent advances in wearable-based human activity recognition (HAR) technology have enabled the detection of FOG subtypes across benchmark datasets. Since FOG manifestation is heterogeneous, developing models that quantify FOG consistently across patients with varying demographics, FOG types, and PD conditions is important. Bias and fairness in FOG models remain understudied in HAR, with research focused mainly on FOG detection using single benchmark datasets. We evaluated the bias and fairness of HAR models for wearable-based FOG detection across demographics and PD conditions using multiple datasets and the effectiveness of transfer learning as a potential bias mitigation approach. Our evaluation using demographic parity ratio (DPR) and equalized odds ratio (EOR) showed model bias (DPR & EOR < 0.8) for all stratified demographic variables, including age, sex, and disease duration. Our experiments demonstrated that transfer learning from multi-site datasets and generic human activity representations significantly improved fairness (average change in DPR +0.027, +0.039, respectively) and performance (average change in F1-score +0.026, +0.018, respectively) across attributes, supporting the hypothesis that generic human activity representations learn fairer representations applicable to health analytics.

Paper Structure

This paper contains 59 sections, 2 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Model bias evaluation pipeline for FOG detection tasks. In this work, we analyze the group fairness of the model based on sensitive attributes in FOG and PD, including age, sex, and disease duration.