Table of Contents
Fetching ...

Clustering in Varying Metrics

Deeparnab Chakrabarty, Jonathan Conroy, Ankita Sarkar

TL;DR

We investigate aggregate clustering across multiple distance metrics on the same point set, formalizing the objective as minimizing a homogeneous aggregator $\\Psi$ over the per-scenario costs $\\mathrm{cost}_t(d_t;S)$. The work identifies a sharp complexity boundary: $T\ge 3$ makes finite-factor approximations infeasible, while $T=2$ admits constant-factor algorithms and parameterized schemes, including an $f(k,T)\\mathrm{poly}(n)$-time $(3+\\varepsilon)$-approximation when $k$ and $T$ are both small. It provides EPAS results in well-structured metrics—bounded scatter dimension and bounded-treewidth graphs—plus precise ETH-based limits, and it offers a suite of techniques: Hochbaum–Shmoys filtering, matroid intersections, LP relaxations with half-integral rounding, and treewidth-based dynamic programming. Together, these results illuminate when robust/uncertainty-aware clustering across several metrics is tractable and when strong hardness persists, guiding practical design under metric variation and network structure.

Abstract

We introduce the aggregated clustering problem, where one is given $T$ instances of a center-based clustering task over the same $n$ points, but under different metrics. The goal is to open $k$ centers to minimize an aggregate of the clustering costs -- e.g., the average or maximum -- where the cost is measured via $k$-center/median/means objectives. More generally, we minimize a norm $Ψ$ over the $T$ cost values. We show that for $T \geq 3$, the problem is inapproximable to any finite factor in polynomial time. For $T = 2$, we give constant-factor approximations. We also show W[2]-hardness when parameterized by $k$, but obtain $f(k,T)\mathrm{poly}(n)$-time 3-approximations when parameterized by both $k$ and $T$. When the metrics have structure, we obtain efficient parameterized approximation schemes (EPAS). If all $T$ metrics have bounded $\varepsilon$-scatter dimension, we achieve a $(1+\varepsilon)$-approximation in $f(k,T,\varepsilon)\mathrm{poly}(n)$ time. If the metrics are induced by edge weights on a common graph $G$ of bounded treewidth $\mathsf{tw}$, and $Ψ$ is the sum function, we get an EPAS in $f(T,\varepsilon,\mathsf{tw})\mathrm{poly}(n,k)$ time. Conversely, unless (randomized) ETH is false, any finite factor approximation is impossible if parametrized by only $T$, even when the treewidth is $\mathsf{tw} = Ω(\mathrm{poly}\log n)$.

Clustering in Varying Metrics

TL;DR

We investigate aggregate clustering across multiple distance metrics on the same point set, formalizing the objective as minimizing a homogeneous aggregator over the per-scenario costs . The work identifies a sharp complexity boundary: makes finite-factor approximations infeasible, while admits constant-factor algorithms and parameterized schemes, including an -time -approximation when and are both small. It provides EPAS results in well-structured metrics—bounded scatter dimension and bounded-treewidth graphs—plus precise ETH-based limits, and it offers a suite of techniques: Hochbaum–Shmoys filtering, matroid intersections, LP relaxations with half-integral rounding, and treewidth-based dynamic programming. Together, these results illuminate when robust/uncertainty-aware clustering across several metrics is tractable and when strong hardness persists, guiding practical design under metric variation and network structure.

Abstract

We introduce the aggregated clustering problem, where one is given instances of a center-based clustering task over the same points, but under different metrics. The goal is to open centers to minimize an aggregate of the clustering costs -- e.g., the average or maximum -- where the cost is measured via -center/median/means objectives. More generally, we minimize a norm over the cost values. We show that for , the problem is inapproximable to any finite factor in polynomial time. For , we give constant-factor approximations. We also show W[2]-hardness when parameterized by , but obtain -time 3-approximations when parameterized by both and . When the metrics have structure, we obtain efficient parameterized approximation schemes (EPAS). If all metrics have bounded -scatter dimension, we achieve a -approximation in time. If the metrics are induced by edge weights on a common graph of bounded treewidth , and is the sum function, we get an EPAS in time. Conversely, unless (randomized) ETH is false, any finite factor approximation is impossible if parametrized by only , even when the treewidth is .

Paper Structure

This paper contains 15 sections, 25 theorems, 28 equations, 1 figure.

Key Result

Theorem 3

For any homogeneous aggregator $\Psi$ and any $z \in \mathbb N \cup \left\{\infty\right\}$, there is no $f(T)\mathrm{poly}(n,k)$-time finite-factor approximation for $\Psi$-aggregate $(k,z)$-clustering on finite metrics on $n$-vertices unless $P=NP$. The result holds even for $T=3$.

Figures (1)

  • Figure 1: Left: A $0/\infty$-metric $d$ represented as a planar graph. Center: A reweighting of a grid graph $H$ that realizes $d$, as in \ref{['lem:grid-drawing']}. Right: A reweighting of a large-treewidth graph $G$ that realizes $d$, as in \ref{['cor:drawing-on-treewidth']}; the supernodes of a grid minor are drawn in orange. In all three images, green lines represent 0-weight edges, and gray lines represent $\infty$-weight edges

Theorems & Definitions (55)

  • Definition 1: Aggregate Clustering Problems
  • Definition 2: Generalized Aggregate Clustering
  • Theorem 3: Hardness of approximation when $T\geq 3$
  • proof
  • Theorem 4
  • Theorem 5: Hardness of approximation on stars in FPT time.
  • proof
  • Remark 6
  • Lemma 7: Theorem 9 of schaefer2021new
  • Lemma 8
  • ...and 45 more