Table of Contents
Fetching ...

Clustering with Non-adaptive Subset Queries

Hadley Black, Euiwoong Lee, Arya Mazumdar, Barna Saha

TL;DR

This work studies recovering a hidden $k$-clustering on $n$ items using non-adaptive subset queries that report how many clusters intersect a query set. By linking subset queries to combinatorial group testing and exploiting random-graph connectivity, the authors devise near-linear non-adaptive algorithms for unrestricted-size queries, with refined bounds for size-bounded and balanced scenarios. They show $O(n\log k\cdot(\log k+\log\log n)^2)$ queries suffice in general (improving to $O(n\log\log n)$ for constant $k$), and provide lower bounds $Ω(\max(n^2/s^2,n))$ when restricting query size to $s$. Additional results cover balanced clusters, and two rounds of adaptivity yield further improvements to $O(n\log k)$ (general) and $O(n\log\log k)$ (balanced). Overall, the paper advances non-adaptive clustering with subset queries by achieving near-linear query complexities across several regimes and linking the problem to established combinatorial- and graph-theoretic techniques.

Abstract

Recovering the underlying clustering of a set $U$ of $n$ points by asking pair-wise same-cluster queries has garnered significant interest in the last decade. Given a query $S \subset U$, $|S|=2$, the oracle returns yes if the points are in the same cluster and no otherwise. For adaptive algorithms with pair-wise queries, the number of required queries is known to be $Θ(nk)$, where $k$ is the number of clusters. However, non-adaptive schemes require $Ω(n^2)$ queries, which matches the trivial $O(n^2)$ upper bound attained by querying every pair of points. To break the quadratic barrier for non-adaptive queries, we study a generalization of this problem to subset queries for $|S|>2$, where the oracle returns the number of clusters intersecting $S$. Allowing for subset queries of unbounded size, $O(n)$ queries is possible with an adaptive scheme (Chakrabarty-Liao, 2024). However, the realm of non-adaptive algorithms is completely unknown. In this paper, we give the first non-adaptive algorithms for clustering with subset queries. Our main result is a non-adaptive algorithm making $O(n \log k \cdot (\log k + \log\log n)^2)$ queries, which improves to $O(n \log \log n)$ when $k$ is a constant. We also consider algorithms with a restricted query size of at most $s$. In this setting we prove that $Ω(\max(n^2/s^2,n))$ queries are necessary and obtain algorithms making $\tilde{O}(n^2k/s^2)$ queries for any $s \leq \sqrt{n}$ and $\tilde{O}(n^2/s)$ queries for any $s \leq n$. We also consider the natural special case when the clusters are balanced, obtaining non-adaptive algorithms which make $O(n \log k) + \tilde{O}(k)$ and $O(n\log^2 k)$ queries. Finally, allowing two rounds of adaptivity, we give an algorithm making $O(n \log k)$ queries in the general case and $O(n \log \log k)$ queries when the clusters are balanced.

Clustering with Non-adaptive Subset Queries

TL;DR

This work studies recovering a hidden -clustering on items using non-adaptive subset queries that report how many clusters intersect a query set. By linking subset queries to combinatorial group testing and exploiting random-graph connectivity, the authors devise near-linear non-adaptive algorithms for unrestricted-size queries, with refined bounds for size-bounded and balanced scenarios. They show queries suffice in general (improving to for constant ), and provide lower bounds when restricting query size to . Additional results cover balanced clusters, and two rounds of adaptivity yield further improvements to (general) and (balanced). Overall, the paper advances non-adaptive clustering with subset queries by achieving near-linear query complexities across several regimes and linking the problem to established combinatorial- and graph-theoretic techniques.

Abstract

Recovering the underlying clustering of a set of points by asking pair-wise same-cluster queries has garnered significant interest in the last decade. Given a query , , the oracle returns yes if the points are in the same cluster and no otherwise. For adaptive algorithms with pair-wise queries, the number of required queries is known to be , where is the number of clusters. However, non-adaptive schemes require queries, which matches the trivial upper bound attained by querying every pair of points. To break the quadratic barrier for non-adaptive queries, we study a generalization of this problem to subset queries for , where the oracle returns the number of clusters intersecting . Allowing for subset queries of unbounded size, queries is possible with an adaptive scheme (Chakrabarty-Liao, 2024). However, the realm of non-adaptive algorithms is completely unknown. In this paper, we give the first non-adaptive algorithms for clustering with subset queries. Our main result is a non-adaptive algorithm making queries, which improves to when is a constant. We also consider algorithms with a restricted query size of at most . In this setting we prove that queries are necessary and obtain algorithms making queries for any and queries for any . We also consider the natural special case when the clusters are balanced, obtaining non-adaptive algorithms which make and queries. Finally, allowing two rounds of adaptivity, we give an algorithm making queries in the general case and queries when the clusters are balanced.
Paper Structure (40 sections, 35 theorems, 53 equations, 8 algorithms)

This paper contains 40 sections, 35 theorems, 53 equations, 8 algorithms.

Key Result

Theorem 1.1

There is a randomized, non-adaptive $k$-clustering algorithm making $O(n \log k \cdot (\log k + \log \log n)^2)$ subset queries.

Theorems & Definitions (60)

  • Theorem 1.1: \ref{['thm:1']}, informal
  • Theorem 1.2: \ref{['thm:nloglogn']}, informal
  • Theorem 1.3: \ref{['cor:3-s-LB']}, restated
  • Theorem 1.4: \ref{['thm:bounded-2']}, informal
  • Theorem 1.5: \ref{['thm:bounded-1']}, informal
  • Theorem 1.6: \ref{['thm:k-bal-1', 'thm:k-bal-2']}, informal
  • Theorem 1.7: \ref{['thm:2-round', 'thm:2-round-bal']}, informal
  • Lemma 1.7
  • Corollary 1.8
  • proof
  • ...and 50 more