Table of Contents
Fetching ...

Interpretable Fair Clustering

Mudi Jiang, Jiahui Zhou, Xinying Liu, Zengyou He, Zhikui Chen

TL;DR

This work addresses the need for interpretable clustering that also enforces group fairness. It proposes IFCT, a decision-tree–based framework that jointly optimizes intra-cluster compactness and fairness via $\mathcal{L}(\mathcal{T}) = \mathcal{L}_C(\mathcal{T}) + \lambda \mathcal{L}_F(\mathcal{T})$, and IFCT-P, a hyperparameter-free variant using post-pruning. The method supports mixed-type features and multiple sensitive attributes, and demonstrates competitive clustering performance with improved fairness and clear interpretability across real-world and synthetic datasets. Experiments show IFCT generally outperforms baselines on fairness while maintaining reasonable accuracy, and IFCT-P delivers robust performance without parameter tuning. The work offers a practical path toward transparent, fair clustering suitable for high-stakes applications.

Abstract

Fair clustering has gained increasing attention in recent years, especially in applications involving socially sensitive attributes. However, existing fair clustering methods often lack interpretability, limiting their applicability in high-stakes scenarios where understanding the rationale behind clustering decisions is essential. In this work, we address this limitation by proposing an interpretable and fair clustering framework, which integrates fairness constraints into the structure of decision trees. Our approach constructs interpretable decision trees that partition the data while ensuring fair treatment across protected groups. To further enhance the practicality of our framework, we also introduce a variant that requires no fairness hyperparameter tuning, achieved through post-pruning a tree constructed without fairness constraints. Extensive experiments on both real-world and synthetic datasets demonstrate that our method not only delivers competitive clustering performance and improved fairness, but also offers additional advantages such as interpretability and the ability to handle multiple sensitive attributes. These strengths enable our method to perform robustly under complex fairness constraints, opening new possibilities for equitable and transparent clustering.

Interpretable Fair Clustering

TL;DR

This work addresses the need for interpretable clustering that also enforces group fairness. It proposes IFCT, a decision-tree–based framework that jointly optimizes intra-cluster compactness and fairness via , and IFCT-P, a hyperparameter-free variant using post-pruning. The method supports mixed-type features and multiple sensitive attributes, and demonstrates competitive clustering performance with improved fairness and clear interpretability across real-world and synthetic datasets. Experiments show IFCT generally outperforms baselines on fairness while maintaining reasonable accuracy, and IFCT-P delivers robust performance without parameter tuning. The work offers a practical path toward transparent, fair clustering suitable for high-stakes applications.

Abstract

Fair clustering has gained increasing attention in recent years, especially in applications involving socially sensitive attributes. However, existing fair clustering methods often lack interpretability, limiting their applicability in high-stakes scenarios where understanding the rationale behind clustering decisions is essential. In this work, we address this limitation by proposing an interpretable and fair clustering framework, which integrates fairness constraints into the structure of decision trees. Our approach constructs interpretable decision trees that partition the data while ensuring fair treatment across protected groups. To further enhance the practicality of our framework, we also introduce a variant that requires no fairness hyperparameter tuning, achieved through post-pruning a tree constructed without fairness constraints. Extensive experiments on both real-world and synthetic datasets demonstrate that our method not only delivers competitive clustering performance and improved fairness, but also offers additional advantages such as interpretability and the ability to handle multiple sensitive attributes. These strengths enable our method to perform robustly under complex fairness constraints, opening new possibilities for equitable and transparent clustering.

Paper Structure

This paper contains 27 sections, 9 equations, 4 figures, 4 tables, 2 algorithms.

Figures (4)

  • Figure 1: Illustration of the IFCT growth process: (a) A toy dataset comprising numerical attributes ($N_1,N_2,N_3$), categorical attributes ($C_1,C_2$), and sensitive attributes ($S_1,S_2$). (b) Each candidate split rule is evaluated based on the objective function, with $N_1 \leq 0$ shown as an example. The dataset is split into $D_L$ and $D_R$, and the total loss $\mathcal{L} (D_L)$ includes the compactness loss of numerical features $\mathcal{L}_{n} (D_L)$, categorical features $\mathcal{L}_{c} (D_L)$, and fairness regularization $\mathcal{L}_{F} (D_L)$, computed according to Eqs. (2), (3), and (5), respectively. The loss of $D_R$ is computed similarly. Given $\rho=\frac{3}{2+3}=0.6$, the rule with the highest gain $\Delta (D)$ is selected and added to the leaf set. (c) The leaf node with the maximum $\Delta$ is then selected for expansion using a best-first strategy, resulting in an updated tree.
  • Figure 2: Effect of the parameter $\lambda$ on all evaluation metrics for IFCT. Each metric is normalized to the range $[0,1]$ by dividing it by its maximum value across all $\lambda$ settings.
  • Figure 3: Visualization of decision trees constructed by IFCT and IFCT-P on the HCV dataset.
  • Figure 4: Running time comparison across different datasets.