Table of Contents
Fetching ...

Low-Distortion Clustering with Ordinal and Limited Cardinal Information

Jakob Burkhardt, Ioannis Caragiannis, Karl Fehrs, Matteo Russo, Chris Schwiegelshohn, Sudarshan Shyam

TL;DR

The paper studies clustering under ordinal (rank-based) information with implicit metrics, defining metric distortion as the worst-case ratio to the fully informed optimum. It develops both low- and zero-query strategies to achieve constant distortion for key objectives: 2-distortion for k-center with $O(k^2)$ queries, 4-distortion with $O(k)$ queries, and a randomized $O(1)$-distortion scheme for $(k,z)$-clustering using sublinear query budgets. It also adapts Meyerson-style approaches for facility location and proves strong lower bounds, including exponential center blowup requirements in the zero- and low-query regimes. Overall, the work maps the tradeoffs between query complexity, center-budget flexibility, and distortion, offering practical ordinal algorithms with provable performance guarantees while clarifying inherent limitations.

Abstract

Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of $n$ agents located in an underlying metric space, our goal is to partition them into $k$ clusters, optimizing some social cost objective. The metric space is defined by a distance function $d$ between the agent locations. Information about $d$ is available only implicitly via $n$ rankings, through which each agent ranks all other agents in terms of their distance from her. Still, we would like to evaluate clustering algorithms in terms of social cost objectives that are defined using $d$. This is done using the notion of distortion, which measures how far from optimality a clustering can be, taking into account all underlying metrics that are consistent with the ordinal information available. Unfortunately, the most important clustering objectives do not admit algorithms with finite distortion. To sidestep this disappointing fact, we follow two alternative approaches: We first explore whether resource augmentation can be beneficial. We consider algorithms that use more than $k$ clusters but compare their social cost to that of the optimal $k$-clusterings. We show that using exponentially (in terms of $k$) many clusters, we can get low (constant or logarithmic) distortion for the $k$-center and $k$-median objectives. Interestingly, such an exponential blowup is shown to be necessary. More importantly, we explore whether limited cardinal information can be used to obtain better results. Somewhat surprisingly, for $k$-median and $k$-center, we show that a number of queries that is polynomial in $k$ and only logarithmic in $n$ (i.e., only sublinear in the number of agents for the most relevant scenarios in practice) is enough to get constant distortion.

Low-Distortion Clustering with Ordinal and Limited Cardinal Information

TL;DR

The paper studies clustering under ordinal (rank-based) information with implicit metrics, defining metric distortion as the worst-case ratio to the fully informed optimum. It develops both low- and zero-query strategies to achieve constant distortion for key objectives: 2-distortion for k-center with queries, 4-distortion with queries, and a randomized -distortion scheme for -clustering using sublinear query budgets. It also adapts Meyerson-style approaches for facility location and proves strong lower bounds, including exponential center blowup requirements in the zero- and low-query regimes. Overall, the work maps the tradeoffs between query complexity, center-budget flexibility, and distortion, offering practical ordinal algorithms with provable performance guarantees while clarifying inherent limitations.

Abstract

Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of agents located in an underlying metric space, our goal is to partition them into clusters, optimizing some social cost objective. The metric space is defined by a distance function between the agent locations. Information about is available only implicitly via rankings, through which each agent ranks all other agents in terms of their distance from her. Still, we would like to evaluate clustering algorithms in terms of social cost objectives that are defined using . This is done using the notion of distortion, which measures how far from optimality a clustering can be, taking into account all underlying metrics that are consistent with the ordinal information available. Unfortunately, the most important clustering objectives do not admit algorithms with finite distortion. To sidestep this disappointing fact, we follow two alternative approaches: We first explore whether resource augmentation can be beneficial. We consider algorithms that use more than clusters but compare their social cost to that of the optimal -clusterings. We show that using exponentially (in terms of ) many clusters, we can get low (constant or logarithmic) distortion for the -center and -median objectives. Interestingly, such an exponential blowup is shown to be necessary. More importantly, we explore whether limited cardinal information can be used to obtain better results. Somewhat surprisingly, for -median and -center, we show that a number of queries that is polynomial in and only logarithmic in (i.e., only sublinear in the number of agents for the most relevant scenarios in practice) is enough to get constant distortion.
Paper Structure (34 sections, 24 theorems, 46 equations, 7 algorithms)

This paper contains 34 sections, 24 theorems, 46 equations, 7 algorithms.

Key Result

Theorem 3.1

There exists a deterministic $2$-distortion algorithm for $k$-center that makes $\frac{k^2 - k}{2}$ distance queries.

Theorems & Definitions (50)

  • Definition 2.1
  • Definition 2.2
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • proof
  • Lemma 3.5
  • proof
  • Lemma 3.6
  • proof
  • ...and 40 more