Table of Contents
Fetching ...

Impossibility of Depth Reduction in Explainable Clustering

Chengyuan Deng, Surya Teja Gavva, Karthik C. S., Parth Patel, Adarsh Srinivasan

TL;DR

This work proves that even when the input points are in the Euclidean plane, then any depth reduction in the explanation incurs unbounded loss in the $k-means and $k-median cost.

Abstract

Over the last few years Explainable Clustering has gathered a lot of attention. Dasgupta et al. [ICML'20] initiated the study of explainable $k$-means and $k$-median clustering problems where the explanation is captured by a threshold decision tree which partitions the space at each node using axis parallel hyperplanes. Recently, Laber et al. [Pattern Recognition'23] made a case to consider the depth of the decision tree as an additional complexity measure of interest. In this work, we prove that even when the input points are in the Euclidean plane, then any depth reduction in the explanation incurs unbounded loss in the $k$-means and $k$-median cost. Formally, we show that there exists a data set $X\subseteq \mathbb{R}^2$, for which there is a decision tree of depth $k-1$ whose $k$-means/$k$-median cost matches the optimal clustering cost of $X$, but every decision tree of depth less than $k-1$ has unbounded cost w.r.t. the optimal cost of clustering. We extend our results to the $k$-center objective as well, albeit with weaker guarantees.

Impossibility of Depth Reduction in Explainable Clustering

TL;DR

This work proves that even when the input points are in the Euclidean plane, then any depth reduction in the explanation incurs unbounded loss in the k-median cost.

Abstract

Over the last few years Explainable Clustering has gathered a lot of attention. Dasgupta et al. [ICML'20] initiated the study of explainable -means and -median clustering problems where the explanation is captured by a threshold decision tree which partitions the space at each node using axis parallel hyperplanes. Recently, Laber et al. [Pattern Recognition'23] made a case to consider the depth of the decision tree as an additional complexity measure of interest. In this work, we prove that even when the input points are in the Euclidean plane, then any depth reduction in the explanation incurs unbounded loss in the -means and -median cost. Formally, we show that there exists a data set , for which there is a decision tree of depth whose -means/-median cost matches the optimal clustering cost of , but every decision tree of depth less than has unbounded cost w.r.t. the optimal cost of clustering. We extend our results to the -center objective as well, albeit with weaker guarantees.
Paper Structure (14 sections, 12 theorems, 2 equations, 2 figures)

This paper contains 14 sections, 12 theorems, 2 equations, 2 figures.

Key Result

Theorem 1.1

The following holds for $k$-means, $k$-median, and $k$-center clustering objectives. For every $k,d\in \mathbb{N}$, such that $d\ge k/2$, there is a point-set $X\in\mathbb{R}^d$, such that $\mathsf{D}^\downarrow(X,k-2)$ is unbounded. Moreover, the price of explainability of $X$ is 1.

Figures (2)

  • Figure 1: Illustration of $X(w,d)$ with the explanation from a decision tree of depth $k-1$. The dotted oval indicates the optimal clustering assignment, and each colored block is the corresponding subspace produced by axis-parallel cut. Each point is associated with a weight $w_i$, and a new cluster is added with distance of $d_i$ from previous cluster $C_{i-2}$.
  • Figure 2: Illustration of the point-set $X$ with the explanation from a decision tree of depth $k-1$.

Theorems & Definitions (21)

  • Theorem 1.1: Impossibility of Shallow Explanations in High Dimensions; Section 3 in moshkovitz2020explainable
  • Theorem 1.2: Impossibility of Shallow Explanations for $k$-median and $k$-means in the Plane; Informal version of \ref{['thm:2d-impossible']}
  • Theorem 1.3: Lower Bound on Price of Depth Reduction for $k$-center in the Plane; Informal version of \ref{['thm:center']}
  • Definition 1: Price of explainability
  • Definition 2: Price of depth reduction
  • Theorem 3.1
  • Lemma 3.1
  • proof
  • Lemma 3.2: Size of this Point-set
  • proof
  • ...and 11 more