Table of Contents
Fetching ...

Ethical and Explainable AI in Reusable MLOps Pipelines

Rakib Hossain, Mahmood Menon Khan, Lisan Al Amin, Dhruv Parikh, Farhana Afroz, Bestoun S. Ahmed

TL;DR

Results demonstrate that automated fairness gates and explainability artefacts can be successfully deployed in production without disrupting operational flow, providing organizations with a practical and credible approach to implementing ethical, transparent, and trustworthy AI across diverse datasets and operational settings.

Abstract

This paper introduces a unified machine learning operations (MLOps) framework that brings ethical artificial intelligence principles into practical use by enforcing fairness, explainability, and governance throughout the machine learning lifecycle. The proposed method reduces bias by lowering the demographic parity difference (DPD) from 0.31 to 0.04 without model retuning, and cross-dataset validation achieves an area under the curve (AUC) of 0.89 on the Statlog Heart dataset. The framework maintains fairness metrics within operational limits across all deployments. Model deployment is blocked if the DPD exceeds 0.05 or if equalized odds (EO) exceeds 0.05 on the validation set. After deployment, retraining is automatically triggered if the 30-day Kolmogorov-Smirnov drift statistic exceeds 0.20. In production, the system consistently achieved DPD <= 0.05 and EO <= 0.03, while the KS statistic remained <= 0.20. Decision-curve analysis indicates a positive net benefit in the 10 to 20 percent operating range, showing that the mitigated model preserves predictive utility while satisfying fairness constraints. These results demonstrate that automated fairness gates and explainability artefacts can be successfully deployed in production without disrupting operational flow, providing organizations with a practical and credible approach to implementing ethical, transparent, and trustworthy AI across diverse datasets and operational settings.

Ethical and Explainable AI in Reusable MLOps Pipelines

TL;DR

Results demonstrate that automated fairness gates and explainability artefacts can be successfully deployed in production without disrupting operational flow, providing organizations with a practical and credible approach to implementing ethical, transparent, and trustworthy AI across diverse datasets and operational settings.

Abstract

This paper introduces a unified machine learning operations (MLOps) framework that brings ethical artificial intelligence principles into practical use by enforcing fairness, explainability, and governance throughout the machine learning lifecycle. The proposed method reduces bias by lowering the demographic parity difference (DPD) from 0.31 to 0.04 without model retuning, and cross-dataset validation achieves an area under the curve (AUC) of 0.89 on the Statlog Heart dataset. The framework maintains fairness metrics within operational limits across all deployments. Model deployment is blocked if the DPD exceeds 0.05 or if equalized odds (EO) exceeds 0.05 on the validation set. After deployment, retraining is automatically triggered if the 30-day Kolmogorov-Smirnov drift statistic exceeds 0.20. In production, the system consistently achieved DPD <= 0.05 and EO <= 0.03, while the KS statistic remained <= 0.20. Decision-curve analysis indicates a positive net benefit in the 10 to 20 percent operating range, showing that the mitigated model preserves predictive utility while satisfying fairness constraints. These results demonstrate that automated fairness gates and explainability artefacts can be successfully deployed in production without disrupting operational flow, providing organizations with a practical and credible approach to implementing ethical, transparent, and trustworthy AI across diverse datasets and operational settings.
Paper Structure (28 sections, 6 figures, 8 tables, 1 algorithm)

This paper contains 28 sections, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Ethical & Explainable MLOps pipeline. Models pass through fairness audits and a CI/CD gate (policy.yaml). SHAP and an Assurance Pack (model card, datasheet, attestation) are logged; approved models enter the registry and are deployed. Monitoring watches drift; violations trigger auto-retraining.
  • Figure 2: Decision-curve analysis (validation; Kaggle cardiovascular cohort). Net Benefit (NB) vs risk threshold for Baseline and Mitigated models compared to Treat-All and Treat-None. The shaded area denotes the 10–20% operating band. Curves overlap for Baseline and Mitigated, indicating preserved utility under mitigation.
  • Figure 3: Fairness before/after reweighting. The dashed line at DPD=0.10 marks the analysis threshold used in the audit; the deployment gates are stricter at DPD=0.05 and EO=0.05.
  • Figure 4: Daily KS drift scores over 30-day period. The shaded band marks the drift gate (KS$\le$0.20); any violation triggers retraining.
  • Figure 5: Global SHAP feature importance on Kaggle test set.
  • ...and 1 more figures