Fairness in Computational Innovations: Identifying Bias in Substance Use Treatment Length of Stay Prediction Models with Policy Implications
Ugur Kursuncu, Aaron Baird, Yusen Xia
TL;DR
This study investigates bias in machine-predicted length-of-stay (LOS) for substance use disorder (SUD) treatment using the 2019 Treatment Episode Data Set for Discharges (TEDS-D). It develops LOS classifiers for inpatient and outpatient settings, applies feature selection to 28 variables, and evaluates fairness with FairLearn across group- and subgroup-level dimensions. The results show race, U.S. region, substance type, DSM diagnosis, and payment source as primary drivers of unfairness, with substantial disparities in several subgroups, particularly among racial minorities and certain regions. The authors propose a two-tier bias-mitigation framework (Model Adjustment and Social Inclusion), discuss policy implications (data collection, calibration of false negatives, transparency), and argue for governance structures to ensure fair, equitable deployment of predictive LOS tools in health care.
Abstract
Predictive machine learning (ML) models are computational innovations that can enhance medical decision-making, including aiding in determining optimal timing for discharging patients. However, societal biases can be encoded into such models, raising concerns about inadvertently affecting health outcomes for disadvantaged groups. This issue is particularly pressing in the context of substance use disorder (SUD) treatment, where biases in predictive models could significantly impact the recovery of highly vulnerable patients. In this study, we focus on the development and assessment of ML models designed to predict the length of stay (LOS) for both inpatients (i.e., residential) and outpatients undergoing SUD treatment. We utilize the Treatment Episode Data Set for Discharges (TEDS-D) from the Substance Abuse and Mental Health Services Administration (SAMHSA). Through the lenses of distributive justice and socio-relational fairness, we assess our models for bias across variables related to demographics (e.g., race) as well as medical (e.g., diagnosis) and financial conditions (e.g., insurance). We find that race, US geographic region, type of substance used, diagnosis, and payment source for treatment are primary indicators of unfairness. From a policy perspective, we provide bias mitigation strategies to achieve fair outcomes. We discuss the implications of these findings for medical decision-making and health equity. We ultimately seek to contribute to the innovation and policy-making literature by seeking to advance the broader objectives of social justice when applying computational innovations in health care.
