Table of Contents
Fetching ...

Developing Fairness-Aware Task Decomposition to Improve Equity in Post-Spinal Fusion Complication Prediction

Yining Yuan, J. Ben Tamo, Wenqi Shi, Yishan Zhong, Micky C. Nnamdi, B. Randall Brenn, Steven W. Hwang, May D. Wang

TL;DR

The paper tackles fairness in postoperative complication prediction for spinal fusion by introducing FAIR-MTL, a fairness-aware multitask learning framework that discovers latent subgroups via unsupervised demographic embedding and routes predictions through subgroup-specific heads. This end-to-end approach combines inverse-frequency weighting and regularization to mitigate subgroup disparities while preserving strong predictive performance, validated on large clinical datasets and externally on INSPIRE. It provides interpretable AI outputs through SHAP and feature-importance analyses and demonstrates that subgroup-aware learning reduces demographic parity and equalized-odds gaps without sacrificing accuracy. The work advances clinically actionable, equitable risk stratification in spine surgery with robust external validation and ablation evidence underscoring the importance of each architectural component. Overall, FAIR-MTL offers a practical path toward fairer, more transparent surgical risk prediction systems with potential for broader clinical deployment.

Abstract

Fairness in clinical prediction models remains a persistent challenge, particularly in high-stakes applications such as spinal fusion surgery for scoliosis, where patient outcomes exhibit substantial heterogeneity. Many existing fairness approaches rely on coarse demographic adjustments or post-hoc corrections, which fail to capture the latent structure of clinical populations and may unintentionally reinforce bias. We propose FAIR-MTL, a fairness-aware multitask learning framework designed to provide equitable and fine-grained prediction of postoperative complication severity. Instead of relying on explicit sensitive attributes during model training, FAIR-MTL employs a data-driven subgroup inference mechanism. We extract a compact demographic embedding, and apply k-means clustering to uncover latent patient subgroups that may be differentially affected by traditional models. These inferred subgroup labels determine task routing within a shared multitask architecture. During training, subgroup imbalance is mitigated through inverse-frequency weighting, and regularization prevents overfitting to smaller groups. Applied to postoperative complication prediction with four severity levels, FAIR-MTL achieves an AUC of 0.86 and an accuracy of 75%, outperforming single-task baselines while substantially reducing bias. For gender, the demographic parity difference decreases to 0.055 and equalized odds to 0.094; for age, these values reduce to 0.056 and 0.148, respectively. Model interpretability is ensured through SHAP and Gini importance analyses, which consistently highlight clinically meaningful predictors such as hemoglobin, hematocrit, and patient weight. Our findings show that incorporating unsupervised subgroup discovery into a multitask framework enables more equitable, interpretable, and clinically actionable predictions for surgical risk stratification.

Developing Fairness-Aware Task Decomposition to Improve Equity in Post-Spinal Fusion Complication Prediction

TL;DR

The paper tackles fairness in postoperative complication prediction for spinal fusion by introducing FAIR-MTL, a fairness-aware multitask learning framework that discovers latent subgroups via unsupervised demographic embedding and routes predictions through subgroup-specific heads. This end-to-end approach combines inverse-frequency weighting and regularization to mitigate subgroup disparities while preserving strong predictive performance, validated on large clinical datasets and externally on INSPIRE. It provides interpretable AI outputs through SHAP and feature-importance analyses and demonstrates that subgroup-aware learning reduces demographic parity and equalized-odds gaps without sacrificing accuracy. The work advances clinically actionable, equitable risk stratification in spine surgery with robust external validation and ablation evidence underscoring the importance of each architectural component. Overall, FAIR-MTL offers a practical path toward fairer, more transparent surgical risk prediction systems with potential for broader clinical deployment.

Abstract

Fairness in clinical prediction models remains a persistent challenge, particularly in high-stakes applications such as spinal fusion surgery for scoliosis, where patient outcomes exhibit substantial heterogeneity. Many existing fairness approaches rely on coarse demographic adjustments or post-hoc corrections, which fail to capture the latent structure of clinical populations and may unintentionally reinforce bias. We propose FAIR-MTL, a fairness-aware multitask learning framework designed to provide equitable and fine-grained prediction of postoperative complication severity. Instead of relying on explicit sensitive attributes during model training, FAIR-MTL employs a data-driven subgroup inference mechanism. We extract a compact demographic embedding, and apply k-means clustering to uncover latent patient subgroups that may be differentially affected by traditional models. These inferred subgroup labels determine task routing within a shared multitask architecture. During training, subgroup imbalance is mitigated through inverse-frequency weighting, and regularization prevents overfitting to smaller groups. Applied to postoperative complication prediction with four severity levels, FAIR-MTL achieves an AUC of 0.86 and an accuracy of 75%, outperforming single-task baselines while substantially reducing bias. For gender, the demographic parity difference decreases to 0.055 and equalized odds to 0.094; for age, these values reduce to 0.056 and 0.148, respectively. Model interpretability is ensured through SHAP and Gini importance analyses, which consistently highlight clinically meaningful predictors such as hemoglobin, hematocrit, and patient weight. Our findings show that incorporating unsupervised subgroup discovery into a multitask framework enables more equitable, interpretable, and clinically actionable predictions for surgical risk stratification.

Paper Structure

This paper contains 21 sections, 9 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Overview of the proposed responsible AI framework for predicting postoperative complication severity, enhancing personalized surgery decision-making. 1. Feature Preprocessing and Label Definition: Pre-operative data, including patient history, lab results, medication, Lines/Drains/Airways, and patient basic information are processed to create a unified feature table. 2. Model Development and Validation: Five different machine learning models are developed and tuned using grid search to optimize hyperparameters. 3. Model Evaluation and Result Interpretation: The trained models are evaluated for fairness across age and gender, performance stability through bootstrap resampling, and feature importance using Gini importance and SHAP interpretations.
  • Figure 2: Overview of the proposed Fairness-Aware Multi-Task Learning Framework. Patient data are first processed to extract sensitive features, which are clustered using k-means to assign individuals into sensitive subgroups (e.g., Group 1 and Group 2). These subgroup labels are then used to route inputs through a shared neural network backbone into corresponding task-specific heads. Each task head specializes in learning patterns unique to its subgroup while sharing common representations. The final output predicts the severity of postoperative complications.
  • Figure 3: Data preprocessing workflow. The patient information table, initially containing 65,728 records, is filtered based on scoliosis diagnosis and spinal-related procedures, resulting in 2,059 patients. The final dataset includes 6,463 features and is split into training (N = 1,647) and test (N = 420) sets.
  • Figure 4: Subgroup accuracy comparison across machine learning models for predicting postoperative complication severity. FAIR-MTL shows more consistent accuracies across subgroups.
  • Figure 5: Gini feature importance plot for the Random Forest model, highlighting the top 40 features contributing to the model's predictions. The most influential features include 'hematocrit value,' 'operation start time (in or dtitm),' and 'weight,' followed by other significant predictors such as 'hemoglobin value,' 'age,' and 'hospital admission time.'
  • ...and 1 more figures