Table of Contents
Fetching ...

Explaining Kernel Clustering via Decision Trees

Maximilian Fleissner, Leena Chennuru Vankadara, Debarghya Ghoshdastidar

TL;DR

This work investigates interpretable kernel clustering, and proposes algorithms that construct decision trees to approximate the partitions induced by kernel k-means, a nonlinear extension of k-means.

Abstract

Despite the growing popularity of explainable and interpretable machine learning, there is still surprisingly limited work on inherently interpretable clustering methods. Recently, there has been a surge of interest in explaining the classic k-means algorithm, leading to efficient algorithms that approximate k-means clusters using axis-aligned decision trees. However, interpretable variants of k-means have limited applicability in practice, where more flexible clustering methods are often needed to obtain useful partitions of the data. In this work, we investigate interpretable kernel clustering, and propose algorithms that construct decision trees to approximate the partitions induced by kernel k-means, a nonlinear extension of k-means. We further build on previous work on explainable k-means and demonstrate how a suitable choice of features allows preserving interpretability without sacrificing approximation guarantees on the interpretable model.

Explaining Kernel Clustering via Decision Trees

TL;DR

This work investigates interpretable kernel clustering, and proposes algorithms that construct decision trees to approximate the partitions induced by kernel k-means, a nonlinear extension of k-means.

Abstract

Despite the growing popularity of explainable and interpretable machine learning, there is still surprisingly limited work on inherently interpretable clustering methods. Recently, there has been a surge of interest in explaining the classic k-means algorithm, leading to efficient algorithms that approximate k-means clusters using axis-aligned decision trees. However, interpretable variants of k-means have limited applicability in practice, where more flexible clustering methods are often needed to obtain useful partitions of the data. In this work, we investigate interpretable kernel clustering, and propose algorithms that construct decision trees to approximate the partitions induced by kernel k-means, a nonlinear extension of k-means. We further build on previous work on explainable k-means and demonstrate how a suitable choice of features allows preserving interpretability without sacrificing approximation guarantees on the interpretable model.
Paper Structure (36 sections, 18 theorems, 41 equations, 7 figures, 2 tables, 4 algorithms)

This paper contains 36 sections, 18 theorems, 41 equations, 7 figures, 2 tables, 4 algorithms.

Key Result

Theorem 1

(The Gaussian kernel cannot have interpretable feature maps) Consider the Gaussian kernel $K(x,y) = e^{-\gamma \Vert x-y\Vert_2^2}$ in $d>1$ dimensions. There exists a dataset $X$ such that for any feature map $\phi : X \rightarrow \mathbb R^D$ satisfying $\langle \phi(x), \phi(y) \rangle = K(x,y)$

Figures (7)

  • Figure 1: $k$-means does not perform well on halfmoons data that is not linearly separable, and explainable $k$-means naturally inherits its flaws. Kernel $k$-means perfectly finds the clusters and hence, its interpretable variant (proposed Kernel IMM) returns an axis-aligned decision tree with good clustering.
  • Figure 2: A schematic of Kernel IMM for Interpretable Taylor kernels.
  • Figure 3: Standard $k$-means is ill-suited for clustering certain datasets, and this translates to explainable $k$-means (not plotted here). Kernel $k$-means recovers the ground truth well. However, Kernel IMM is restricted to $3$ leaves and not powerful enough to approximate it. To resolve this, we suggest Kernel ExKMC and Kernel Expand, which extend the tree to $6$ leaves.
  • Figure 4: We verify the approximation properties of our algorithms by computing the price of explainability (left plot). We also compare the clusters obtained on $k$-means and IMM, as well as kernel $k$-means and our three algorithms to the ground truth via the Rand index (right plot).
  • Figure 5: The threshold cut illustrated by the black vertical line defines a decision tree $T$ with 2 leaves. While some points do not end up in the same leaf as their corresponding center, the tree $T$ clearly does a good job in approximating the two clusters.
  • ...and 2 more figures

Theorems & Definitions (34)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • Definition 2
  • Remark 1
  • Definition 3
  • Theorem 3
  • Definition 4
  • Theorem 4
  • Theorem 5
  • ...and 24 more