Table of Contents
Fetching ...

Machine Learning Techniques with Fairness for Prediction of Completion of Drug and Alcohol Rehabilitation

Karen Roberts-Licklider, Theodore Trafalis

TL;DR

This work tackles fairness-aware prediction of rehab completion and prior-treatment counts using SAMHSA data from Oklahoma, addressing bias across nine demographic variables. It combines aggressive preprocessing (one-hot encoding, SMOTEN balancing) with intersectional fairness via dual and three-way interactions, and introduces new worst-case fairness metrics to evaluate robustness. Across models, decision trees and random forests deliver strong predictive accuracy, while SVMs with tuned kernels provide competitive results; reweighting and intersectionalization generally improve fairness with limited loss in accuracy. The study offers a practical, scalable framework for fairness-conscious outcome prediction in health services, with implications for policy and resource allocation when evaluating rehabilitation programs.

Abstract

The aim of this study is to look at predicting whether a person will complete a drug and alcohol rehabilitation program and the number of times a person attends. The study is based on demographic data obtained from Substance Abuse and Mental Health Services Administration (SAMHSA) from both admissions and discharge data from drug and alcohol rehabilitation centers in Oklahoma. Demographic data is highly categorical which led to binary encoding being used and various fairness measures being utilized to mitigate bias of nine demographic variables. Kernel methods such as linear, polynomial, sigmoid, and radial basis functions were compared using support vector machines at various parameter ranges to find the optimal values. These were then compared to methods such as decision trees, random forests, and neural networks. Synthetic Minority Oversampling Technique Nominal (SMOTEN) for categorical data was used to balance the data with imputation for missing data. The nine bias variables were then intersectionalized to mitigate bias and the dual and triple interactions were integrated to use the probabilities to look at worst case ratio fairness mitigation. Disparate Impact, Statistical Parity difference, Conditional Statistical Parity Ratio, Demographic Parity, Demographic Parity Ratio, Equalized Odds, Equalized Odds Ratio, Equal Opportunity, and Equalized Opportunity Ratio were all explored at both the binary and multiclass scenarios.

Machine Learning Techniques with Fairness for Prediction of Completion of Drug and Alcohol Rehabilitation

TL;DR

This work tackles fairness-aware prediction of rehab completion and prior-treatment counts using SAMHSA data from Oklahoma, addressing bias across nine demographic variables. It combines aggressive preprocessing (one-hot encoding, SMOTEN balancing) with intersectional fairness via dual and three-way interactions, and introduces new worst-case fairness metrics to evaluate robustness. Across models, decision trees and random forests deliver strong predictive accuracy, while SVMs with tuned kernels provide competitive results; reweighting and intersectionalization generally improve fairness with limited loss in accuracy. The study offers a practical, scalable framework for fairness-conscious outcome prediction in health services, with implications for policy and resource allocation when evaluating rehabilitation programs.

Abstract

The aim of this study is to look at predicting whether a person will complete a drug and alcohol rehabilitation program and the number of times a person attends. The study is based on demographic data obtained from Substance Abuse and Mental Health Services Administration (SAMHSA) from both admissions and discharge data from drug and alcohol rehabilitation centers in Oklahoma. Demographic data is highly categorical which led to binary encoding being used and various fairness measures being utilized to mitigate bias of nine demographic variables. Kernel methods such as linear, polynomial, sigmoid, and radial basis functions were compared using support vector machines at various parameter ranges to find the optimal values. These were then compared to methods such as decision trees, random forests, and neural networks. Synthetic Minority Oversampling Technique Nominal (SMOTEN) for categorical data was used to balance the data with imputation for missing data. The nine bias variables were then intersectionalized to mitigate bias and the dual and triple interactions were integrated to use the probabilities to look at worst case ratio fairness mitigation. Disparate Impact, Statistical Parity difference, Conditional Statistical Parity Ratio, Demographic Parity, Demographic Parity Ratio, Equalized Odds, Equalized Odds Ratio, Equal Opportunity, and Equalized Opportunity Ratio were all explored at both the binary and multiclass scenarios.
Paper Structure (35 sections, 37 equations, 47 figures, 16 tables)

This paper contains 35 sections, 37 equations, 47 figures, 16 tables.

Figures (47)

  • Figure 1: Reason Codes
  • Figure 2: Services One Hot Encoding
  • Figure 3: Completed SMOTE Applied
  • Figure 4: NOPRIOR SMOTE Applied
  • Figure 5: Completed_NORPIOR SMOTE Applied
  • ...and 42 more figures