Table of Contents
Fetching ...

Dynamic Algorithm for Explainable k-medians Clustering under lp Norm

Konstantin Makarychev, Ilias Papanikolaou, Liren Shan

TL;DR

The paper develops a dynamic, explainable clustering framework for k-medians under lp norms, introducing a static Partition_Leaf-based algorithm that achieves a near-optimal trade-off between interpretability and clustering cost for all finite p >= 1. It proves a tight approximation bound of O(p (log k)^{1+1/p-1/p^2} log log k) relative to OPT_{k,p}(X) and extends the approach to a dynamic setting with amortized update time O(d log^3 k) and recourse O(log k) using an exponential-clock construction. Lower bounds establish fundamental limits: an Omega(log k) barrier for p >= 1 and a no-universal-p algorithm result, plus a separate Omega(d^{1/4}) bound for universal strategies across p. The dynamic framework can be integrated with existing dynamic k-medians methods and supports multi-k scenarios, enabling scalable, interpretable clustering in evolving datasets.

Abstract

We study the problem of explainable k-medians clustering introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (2020). In this problem, the goal is to construct a threshold decision tree that partitions data into k clusters while minimizing the k-medians objective. These trees are interpretable because each internal node makes a simple decision by thresholding a single feature, allowing users to trace and understand how each point is assigned to a cluster. We present the first algorithm for explainable k-medians under lp norm for every finite p >= 1. Our algorithm achieves an O(p(log k)^{1 + 1/p - 1/p^2}) approximation to the optimal k-medians cost for any p >= 1. Previously, algorithms were known only for p = 1 and p = 2. For p = 2, our algorithm improves upon the existing bound of O(log^{3/2}k), and for p = 1, it matches the tight bound of log k + O(1) up to a multiplicative O(log log k) factor. We show how to implement our algorithm in a dynamic setting. The dynamic algorithm maintains an explainable clustering under a sequence of insertions and deletions, with amortized update time O(d log^3 k) and O(log k) recourse, making it suitable for large-scale and evolving datasets.

Dynamic Algorithm for Explainable k-medians Clustering under lp Norm

TL;DR

The paper develops a dynamic, explainable clustering framework for k-medians under lp norms, introducing a static Partition_Leaf-based algorithm that achieves a near-optimal trade-off between interpretability and clustering cost for all finite p >= 1. It proves a tight approximation bound of O(p (log k)^{1+1/p-1/p^2} log log k) relative to OPT_{k,p}(X) and extends the approach to a dynamic setting with amortized update time O(d log^3 k) and recourse O(log k) using an exponential-clock construction. Lower bounds establish fundamental limits: an Omega(log k) barrier for p >= 1 and a no-universal-p algorithm result, plus a separate Omega(d^{1/4}) bound for universal strategies across p. The dynamic framework can be integrated with existing dynamic k-medians methods and supports multi-k scenarios, enabling scalable, interpretable clustering in evolving datasets.

Abstract

We study the problem of explainable k-medians clustering introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (2020). In this problem, the goal is to construct a threshold decision tree that partitions data into k clusters while minimizing the k-medians objective. These trees are interpretable because each internal node makes a simple decision by thresholding a single feature, allowing users to trace and understand how each point is assigned to a cluster. We present the first algorithm for explainable k-medians under lp norm for every finite p >= 1. Our algorithm achieves an O(p(log k)^{1 + 1/p - 1/p^2}) approximation to the optimal k-medians cost for any p >= 1. Previously, algorithms were known only for p = 1 and p = 2. For p = 2, our algorithm improves upon the existing bound of O(log^{3/2}k), and for p = 1, it matches the tight bound of log k + O(1) up to a multiplicative O(log log k) factor. We show how to implement our algorithm in a dynamic setting. The dynamic algorithm maintains an explainable clustering under a sequence of insertions and deletions, with amortized update time O(d log^3 k) and O(log k) recourse, making it suitable for large-scale and evolving datasets.

Paper Structure

This paper contains 17 sections, 28 theorems, 117 equations, 3 figures.

Key Result

Theorem 3.1

Given a set of points X and a set of $k$ centers $C$, for any $p \geq 1$, Algorithm finds a threshold tree $\mathcal{T}$ with $k$ leaves that has $k$-medians cost

Figures (3)

  • Figure 1: Algorithm $\textsc{Partition\_Leaf}$ for explainable $k$-medians in $\ell_p$
  • Figure 2: Dynamic algorithm for explainable $k$-medians in $\ell_p$
  • Figure 3: Fully Dynamic algorithm for explainable $k$-medians in $\ell_p$

Theorems & Definitions (55)

  • Theorem 3.1
  • Definition 3.2
  • Lemma 3.2
  • proof : Proof of Theorem \ref{['thm: improved analysis - lp upper bound']}
  • Lemma 3.2
  • Lemma 3.2: Lemma 6.1 in makarychev2022explainable
  • Lemma 3.2
  • Lemma 3.2
  • Definition 3.3
  • Definition 3.4
  • ...and 45 more