Impossibility of Depth Reduction in Explainable Clustering

Chengyuan Deng; Surya Teja Gavva; Karthik C. S.; Parth Patel; Adarsh Srinivasan

Impossibility of Depth Reduction in Explainable Clustering

Chengyuan Deng, Surya Teja Gavva, Karthik C. S., Parth Patel, Adarsh Srinivasan

TL;DR

This work proves that even when the input points are in the Euclidean plane, then any depth reduction in the explanation incurs unbounded loss in the $k-means and $k-median cost.

Abstract

Over the last few years Explainable Clustering has gathered a lot of attention. Dasgupta et al. [ICML'20] initiated the study of explainable $k$-means and $k$-median clustering problems where the explanation is captured by a threshold decision tree which partitions the space at each node using axis parallel hyperplanes. Recently, Laber et al. [Pattern Recognition'23] made a case to consider the depth of the decision tree as an additional complexity measure of interest. In this work, we prove that even when the input points are in the Euclidean plane, then any depth reduction in the explanation incurs unbounded loss in the $k$-means and $k$-median cost. Formally, we show that there exists a data set $X\subseteq \mathbb{R}^2$, for which there is a decision tree of depth $k-1$ whose $k$-means/$k$-median cost matches the optimal clustering cost of $X$, but every decision tree of depth less than $k-1$ has unbounded cost w.r.t. the optimal cost of clustering. We extend our results to the $k$-center objective as well, albeit with weaker guarantees.

Impossibility of Depth Reduction in Explainable Clustering

TL;DR

This work proves that even when the input points are in the Euclidean plane, then any depth reduction in the explanation incurs unbounded loss in the

k-median cost.

Abstract

Over the last few years Explainable Clustering has gathered a lot of attention. Dasgupta et al. [ICML'20] initiated the study of explainable

-means and

-median clustering problems where the explanation is captured by a threshold decision tree which partitions the space at each node using axis parallel hyperplanes. Recently, Laber et al. [Pattern Recognition'23] made a case to consider the depth of the decision tree as an additional complexity measure of interest. In this work, we prove that even when the input points are in the Euclidean plane, then any depth reduction in the explanation incurs unbounded loss in the

-means and

-median cost. Formally, we show that there exists a data set

, for which there is a decision tree of depth

whose

-means/

-median cost matches the optimal clustering cost of

, but every decision tree of depth less than

has unbounded cost w.r.t. the optimal cost of clustering. We extend our results to the

-center objective as well, albeit with weaker guarantees.

Paper Structure (14 sections, 12 theorems, 2 equations, 2 figures)

This paper contains 14 sections, 12 theorems, 2 equations, 2 figures.

Introduction
Preliminaries
Impossibility of Depth Reduction in the Plane for $k$-means and $k$-median
Point-set Construction and Optimal Clustering
Proof of \ref{['thm:2d-impossible']}
Lower Bound on Price of Depth Reduction in the Plane for $k$-center
Point-Set construction
Proof of Theorem \ref{['thm:center']}
Discussion and Open Problems
Other Metric Spaces
Implications of Our Lower Bound Construction
Open Problems
Open Problem 1.
Open Problem 2.

Key Result

Theorem 1.1

The following holds for $k$-means, $k$-median, and $k$-center clustering objectives. For every $k,d\in \mathbb{N}$, such that $d\ge k/2$, there is a point-set $X\in\mathbb{R}^d$, such that $\mathsf{D}^\downarrow(X,k-2)$ is unbounded. Moreover, the price of explainability of $X$ is 1.

Figures (2)

Figure 1: Illustration of $X(w,d)$ with the explanation from a decision tree of depth $k-1$. The dotted oval indicates the optimal clustering assignment, and each colored block is the corresponding subspace produced by axis-parallel cut. Each point is associated with a weight $w_i$, and a new cluster is added with distance of $d_i$ from previous cluster $C_{i-2}$.
Figure 2: Illustration of the point-set $X$ with the explanation from a decision tree of depth $k-1$.

Theorems & Definitions (21)

Theorem 1.1: Impossibility of Shallow Explanations in High Dimensions; Section 3 in moshkovitz2020explainable
Theorem 1.2: Impossibility of Shallow Explanations for $k$-median and $k$-means in the Plane; Informal version of \ref{['thm:2d-impossible']}
Theorem 1.3: Lower Bound on Price of Depth Reduction for $k$-center in the Plane; Informal version of \ref{['thm:center']}
Definition 1: Price of explainability
Definition 2: Price of depth reduction
Theorem 3.1
Lemma 3.1
proof
Lemma 3.2: Size of this Point-set
proof
...and 11 more

Impossibility of Depth Reduction in Explainable Clustering

TL;DR

Abstract

Impossibility of Depth Reduction in Explainable Clustering

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (21)