A Fuzzy Logic-Based Framework for Explainable Machine Learning in Big Data Analytics
Farjana Yesmin, Nusrat Shirmin
TL;DR
The paper tackles the challenge of interpretable and fair ML in big data, particularly environmental monitoring where uncertainty and regulatory constraints matter. It proposes an intrinsic explainability framework that fuses type-2 fuzzy clustering, granular computing, and rule-based explanations to produce human-readable insights while assessing fairness. Key contributions include a type-2 fuzzy clustering method that improves cluster cohesion (silhouette) by about $4\%$ and reduces unfairness indicated by entropy by roughly $12\%$ relative to type-1 methods, the integration of fairness metrics into unsupervised learning, a rule-based explanation module with average coverage $0.65$ and significance $0.82$, and a scalable, linear-runtime implementation. Empirical results on the UCI Air Quality dataset show superior interpretability, fairness, and efficiency compared with baselines like DBSCAN and Agglomerative Clustering, suggesting practical value for big-data environmental analytics and regulatory compliance.
Abstract
The growing complexity of machine learning (ML) models in big data analytics, especially in domains such as environmental monitoring, highlights the critical need for interpretability and explainability to promote trust, ethical considerations, and regulatory adherence (e.g., GDPR). Traditional "black-box" models obstruct transparency, whereas post-hoc explainable AI (XAI) techniques like LIME and SHAP frequently compromise accuracy or fail to deliver inherent insights. This paper presents a novel framework that combines type-2 fuzzy sets, granular computing, and clustering to boost explainability and fairness in big data environments. When applied to the UCI Air Quality dataset, the framework effectively manages uncertainty in noisy sensor data, produces linguistic rules, and assesses fairness using silhouette scores and entropy. Key contributions encompass: (1) A type-2 fuzzy clustering approach that enhances cohesion by about 4% compared to type-1 methods (silhouette 0.365 vs. 0.349) and improves fairness (entropy 0.918); (2) Incorporation of fairness measures to mitigate biases in unsupervised scenarios; (3) A rule-based component for intrinsic XAI, achieving an average coverage of 0.65; (4) Scalable assessments showing linear runtime (roughly 0.005 seconds for sampled big data sizes). Experimental outcomes reveal superior performance relative to baselines such as DBSCAN and Agglomerative Clustering in terms of interpretability, fairness, and efficiency. Notably, the proposed method achieves a 4% improvement in silhouette score over type-1 fuzzy clustering and outperforms baselines in fairness (entropy reduction by up to 1%) and efficiency.
