Table of Contents
Fetching ...

A Fuzzy Logic-Based Framework for Explainable Machine Learning in Big Data Analytics

Farjana Yesmin, Nusrat Shirmin

TL;DR

The paper tackles the challenge of interpretable and fair ML in big data, particularly environmental monitoring where uncertainty and regulatory constraints matter. It proposes an intrinsic explainability framework that fuses type-2 fuzzy clustering, granular computing, and rule-based explanations to produce human-readable insights while assessing fairness. Key contributions include a type-2 fuzzy clustering method that improves cluster cohesion (silhouette) by about $4\%$ and reduces unfairness indicated by entropy by roughly $12\%$ relative to type-1 methods, the integration of fairness metrics into unsupervised learning, a rule-based explanation module with average coverage $0.65$ and significance $0.82$, and a scalable, linear-runtime implementation. Empirical results on the UCI Air Quality dataset show superior interpretability, fairness, and efficiency compared with baselines like DBSCAN and Agglomerative Clustering, suggesting practical value for big-data environmental analytics and regulatory compliance.

Abstract

The growing complexity of machine learning (ML) models in big data analytics, especially in domains such as environmental monitoring, highlights the critical need for interpretability and explainability to promote trust, ethical considerations, and regulatory adherence (e.g., GDPR). Traditional "black-box" models obstruct transparency, whereas post-hoc explainable AI (XAI) techniques like LIME and SHAP frequently compromise accuracy or fail to deliver inherent insights. This paper presents a novel framework that combines type-2 fuzzy sets, granular computing, and clustering to boost explainability and fairness in big data environments. When applied to the UCI Air Quality dataset, the framework effectively manages uncertainty in noisy sensor data, produces linguistic rules, and assesses fairness using silhouette scores and entropy. Key contributions encompass: (1) A type-2 fuzzy clustering approach that enhances cohesion by about 4% compared to type-1 methods (silhouette 0.365 vs. 0.349) and improves fairness (entropy 0.918); (2) Incorporation of fairness measures to mitigate biases in unsupervised scenarios; (3) A rule-based component for intrinsic XAI, achieving an average coverage of 0.65; (4) Scalable assessments showing linear runtime (roughly 0.005 seconds for sampled big data sizes). Experimental outcomes reveal superior performance relative to baselines such as DBSCAN and Agglomerative Clustering in terms of interpretability, fairness, and efficiency. Notably, the proposed method achieves a 4% improvement in silhouette score over type-1 fuzzy clustering and outperforms baselines in fairness (entropy reduction by up to 1%) and efficiency.

A Fuzzy Logic-Based Framework for Explainable Machine Learning in Big Data Analytics

TL;DR

The paper tackles the challenge of interpretable and fair ML in big data, particularly environmental monitoring where uncertainty and regulatory constraints matter. It proposes an intrinsic explainability framework that fuses type-2 fuzzy clustering, granular computing, and rule-based explanations to produce human-readable insights while assessing fairness. Key contributions include a type-2 fuzzy clustering method that improves cluster cohesion (silhouette) by about and reduces unfairness indicated by entropy by roughly relative to type-1 methods, the integration of fairness metrics into unsupervised learning, a rule-based explanation module with average coverage and significance , and a scalable, linear-runtime implementation. Empirical results on the UCI Air Quality dataset show superior interpretability, fairness, and efficiency compared with baselines like DBSCAN and Agglomerative Clustering, suggesting practical value for big-data environmental analytics and regulatory compliance.

Abstract

The growing complexity of machine learning (ML) models in big data analytics, especially in domains such as environmental monitoring, highlights the critical need for interpretability and explainability to promote trust, ethical considerations, and regulatory adherence (e.g., GDPR). Traditional "black-box" models obstruct transparency, whereas post-hoc explainable AI (XAI) techniques like LIME and SHAP frequently compromise accuracy or fail to deliver inherent insights. This paper presents a novel framework that combines type-2 fuzzy sets, granular computing, and clustering to boost explainability and fairness in big data environments. When applied to the UCI Air Quality dataset, the framework effectively manages uncertainty in noisy sensor data, produces linguistic rules, and assesses fairness using silhouette scores and entropy. Key contributions encompass: (1) A type-2 fuzzy clustering approach that enhances cohesion by about 4% compared to type-1 methods (silhouette 0.365 vs. 0.349) and improves fairness (entropy 0.918); (2) Incorporation of fairness measures to mitigate biases in unsupervised scenarios; (3) A rule-based component for intrinsic XAI, achieving an average coverage of 0.65; (4) Scalable assessments showing linear runtime (roughly 0.005 seconds for sampled big data sizes). Experimental outcomes reveal superior performance relative to baselines such as DBSCAN and Agglomerative Clustering in terms of interpretability, fairness, and efficiency. Notably, the proposed method achieves a 4% improvement in silhouette score over type-1 fuzzy clustering and outperforms baselines in fairness (entropy reduction by up to 1%) and efficiency.

Paper Structure

This paper contains 20 sections, 1 equation, 16 figures, 1 table, 1 algorithm.

Figures (16)

  • Figure 1: Layered architecture of the proposed framework, showing the flow from data input to decision-making with fairness checks.
  • Figure 2: Visualization of clusters in a 2D PCA projection (explained variance: 0.82). Cluster 1 (blue) represents high pollution levels with $\mathrm{CO} > 2 \mathrm{mg}/\mathrm{m}^3$, justifying the choice of $c=3$ clusters based on this clear separation.
  • Figure 3: Heatmap of cluster centers (without numerical values). Cluster 1 shows elevated CO and NOx, indicating poor air quality.
  • Figure 4: Cluster visualization using 2D PCA (explained variance: 0.82), showing distinct separation.
  • Figure 5: DBSCAN clustering results. Many points are labeled as noise (gray), indicating reduced clustering performance.
  • ...and 11 more figures