Low-Distortion Clustering with Ordinal and Limited Cardinal Information

Jakob Burkhardt; Ioannis Caragiannis; Karl Fehrs; Matteo Russo; Chris Schwiegelshohn; Sudarshan Shyam

Low-Distortion Clustering with Ordinal and Limited Cardinal Information

Jakob Burkhardt, Ioannis Caragiannis, Karl Fehrs, Matteo Russo, Chris Schwiegelshohn, Sudarshan Shyam

TL;DR

The paper studies clustering under ordinal (rank-based) information with implicit metrics, defining metric distortion as the worst-case ratio to the fully informed optimum. It develops both low- and zero-query strategies to achieve constant distortion for key objectives: 2-distortion for k-center with $O(k^2)$ queries, 4-distortion with $O(k)$ queries, and a randomized $O(1)$-distortion scheme for $(k,z)$-clustering using sublinear query budgets. It also adapts Meyerson-style approaches for facility location and proves strong lower bounds, including exponential center blowup requirements in the zero- and low-query regimes. Overall, the work maps the tradeoffs between query complexity, center-budget flexibility, and distortion, offering practical ordinal algorithms with provable performance guarantees while clarifying inherent limitations.

Abstract

Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of $n$ agents located in an underlying metric space, our goal is to partition them into $k$ clusters, optimizing some social cost objective. The metric space is defined by a distance function $d$ between the agent locations. Information about $d$ is available only implicitly via $n$ rankings, through which each agent ranks all other agents in terms of their distance from her. Still, we would like to evaluate clustering algorithms in terms of social cost objectives that are defined using $d$. This is done using the notion of distortion, which measures how far from optimality a clustering can be, taking into account all underlying metrics that are consistent with the ordinal information available. Unfortunately, the most important clustering objectives do not admit algorithms with finite distortion. To sidestep this disappointing fact, we follow two alternative approaches: We first explore whether resource augmentation can be beneficial. We consider algorithms that use more than $k$ clusters but compare their social cost to that of the optimal $k$-clusterings. We show that using exponentially (in terms of $k$) many clusters, we can get low (constant or logarithmic) distortion for the $k$-center and $k$-median objectives. Interestingly, such an exponential blowup is shown to be necessary. More importantly, we explore whether limited cardinal information can be used to obtain better results. Somewhat surprisingly, for $k$-median and $k$-center, we show that a number of queries that is polynomial in $k$ and only logarithmic in $n$ (i.e., only sublinear in the number of agents for the most relevant scenarios in practice) is enough to get constant distortion.

Low-Distortion Clustering with Ordinal and Limited Cardinal Information

TL;DR

queries, 4-distortion with

queries, and a randomized

-distortion scheme for

-clustering using sublinear query budgets. It also adapts Meyerson-style approaches for facility location and proves strong lower bounds, including exponential center blowup requirements in the zero- and low-query regimes. Overall, the work maps the tradeoffs between query complexity, center-budget flexibility, and distortion, offering practical ordinal algorithms with provable performance guarantees while clarifying inherent limitations.

Abstract

Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of

agents located in an underlying metric space, our goal is to partition them into

clusters, optimizing some social cost objective. The metric space is defined by a distance function

between the agent locations. Information about

is available only implicitly via

rankings, through which each agent ranks all other agents in terms of their distance from her. Still, we would like to evaluate clustering algorithms in terms of social cost objectives that are defined using

. This is done using the notion of distortion, which measures how far from optimality a clustering can be, taking into account all underlying metrics that are consistent with the ordinal information available. Unfortunately, the most important clustering objectives do not admit algorithms with finite distortion. To sidestep this disappointing fact, we follow two alternative approaches: We first explore whether resource augmentation can be beneficial. We consider algorithms that use more than

clusters but compare their social cost to that of the optimal

-clusterings. We show that using exponentially (in terms of

) many clusters, we can get low (constant or logarithmic) distortion for the

-center and

-median objectives. Interestingly, such an exponential blowup is shown to be necessary. More importantly, we explore whether limited cardinal information can be used to obtain better results. Somewhat surprisingly, for

-median and

-center, we show that a number of queries that is polynomial in

and only logarithmic in

(i.e., only sublinear in the number of agents for the most relevant scenarios in practice) is enough to get constant distortion.

Paper Structure (34 sections, 24 theorems, 46 equations, 7 algorithms)

This paper contains 34 sections, 24 theorems, 46 equations, 7 algorithms.

Introduction
Our Results
Related Work
Ordinal Preferences and Distortion
Clustering and Facility Location
Preliminaries
Algorithms for $k$-Center
$2$-Distortion Algorithms
$4$-Distortion Algorithm with $O(k)$ Queries
Algorithms for $(k,z)$-Clustering
Zero-Query Bi-Criteria Algorithm
Analysis:
$O(1)$-Distortion Algorithm with $O(k^4 \log^5 n)$ Queries
Algorithm.
Analysis.
...and 19 more sections

Key Result

Theorem 3.1

There exists a deterministic $2$-distortion algorithm for $k$-center that makes $\frac{k^2 - k}{2}$ distance queries.

Theorems & Definitions (50)

Definition 2.1
Definition 2.2
Theorem 3.1
Theorem 3.2
Theorem 3.3
proof
Lemma 3.5
proof
Lemma 3.6
proof
...and 40 more

Low-Distortion Clustering with Ordinal and Limited Cardinal Information

TL;DR

Abstract

Low-Distortion Clustering with Ordinal and Limited Cardinal Information

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (50)