Accelerating data-driven algorithm selection for combinatorial partitioning problems

Vaggos Chatziafratis; Ishani Karmarkar; Yingxi Li; Ellen Vitercik

Accelerating data-driven algorithm selection for combinatorial partitioning problems

Vaggos Chatziafratis, Ishani Karmarkar, Yingxi Li, Ellen Vitercik

TL;DR

The paper tackles the scalability bottleneck in data-driven algorithm selection by introducing size generalization, the problem of predicting an algorithm’s performance on large instances from small, representative subsamples. It develops rigorous guarantees for clustering and max-cut, covering center-based methods (k-means++, k-centers with a softened Gonzalez variant) and single-linkage, as well as GW and Greedy for max-cut, with subsample sizes that can be independent of entire instance size under natural conditions. The authors introduce Seeding and ApxSeeding for clustering, SoftmaxCenters to balance exploration and approximation, and provide a general SDP- and martingale-based framework to relate subgraph objectives to full-graph performance. Complemented by experiments on synthetic and real data, the work demonstrates substantial runtime speedups for algorithm selection while preserving predictive accuracy, and outlines a path to extend size generalization to broader optimization problems.

Abstract

Data-driven algorithm selection is a powerful approach for choosing effective heuristics for computational problems. It operates by evaluating a set of candidate algorithms on a collection of representative training instances and selecting the one with the best empirical performance. However, running each algorithm on every training instance is computationally expensive, making scalability a central challenge. In practice, a common workaround is to evaluate algorithms on smaller proxy instances derived from the original inputs. However, this practice has remained largely ad hoc and lacked theoretical grounding. We provide the first theoretical foundations for this practice by formalizing the notion of size generalization: predicting an algorithm's performance on a large instance by evaluating it on a smaller, representative instance, subsampled from the original instance. We provide size generalization guarantees for three widely used clustering algorithms (single-linkage, $k$-means++, and Gonzalez's $k$-centers heuristic) and two canonical max-cut algorithms (Goemans-Williamson and Greedy). We characterize the subsample size sufficient to ensure that performance on the subsample reflects performance on the full instance, and our experiments support these findings.

Accelerating data-driven algorithm selection for combinatorial partitioning problems

TL;DR

Abstract

-means++, and Gonzalez's

-centers heuristic) and two canonical max-cut algorithms (Goemans-Williamson and Greedy). We characterize the subsample size sufficient to ensure that performance on the subsample reflects performance on the full instance, and our experiments support these findings.

Paper Structure (54 sections, 49 theorems, 124 equations, 12 figures, 3 tables, 5 algorithms)

This paper contains 54 sections, 49 theorems, 124 equations, 12 figures, 3 tables, 5 algorithms.

Introduction
Motivation from prior empirical work.
Our contributions
Clustering algorithm selection.
Max-cut algorithm selection.
Additional related work
General notation
Size generalization for clustering algorithm selection
$k$-Means++ and $k$-Centers clustering
Application to Gonzalez's $k$-centers heuristic.
Application to $k$-means++.
Single-linkage clustering
Max-cut
The Goemans-Williamson (GW) algorithm
Size generalization for GW SDP objective value.
...and 39 more sections

Key Result

Theorem 2.2

Let $\mathcal{X} \subset \mathbb R^d$, $\epsilon, \epsilon' > 0, \delta \in (0, 1)$, and $k \in {\mathbb Z}_{> 0}$. Define the sample complexity $m = \mathcal{O}({\zeta_{k,f}}(\mathcal{X})\log(k/\epsilon))$ where $\zeta_{k,f}(\mathcal{X})$ quantifies the sampling distribution's smoothness: Let $S$ and $S'$ be the partitions of $\mathcal{X}$ induced by ${\mathsf{Seeding}}\xspace{(\mathcal{X}, k, f

Figures (12)

Figure 1: The proxy algorithms' accuracies on the subsample approach those of the original algorithms on the full instance as the sample size grows. Figure \ref{['fig:accuracy_kcenter']} shows this for clustering algorithms, and Figure \ref{['fig:maxcut_experiment']} for max-cut algorithms. Shadows denote two standard errors about the average.
Figure 2: Sensitivity of $k$-means++ with respect to a single point. The example shows that the algorithm's accuracy can be extremely sensitive to the presence or absence of a single point, in this case, the outlier at (20, 20). Depending on how the ground truth is defined, deleting the outlier can either boost or drop the accuracy by up to 50%.
Figure 3: Example of a ground truth where size generalization by subsampling fails for single linkage. The shading indicates the ground truth. The first $n-1$ points are unit distance apart, while the last two points are $1+x> 1$ distance apart. Single linkage has 0 cost on the full dataset. After randomly deleting a single point, the largest gap between consecutive points is equally likely to occur between any consecutive points. So, the expected cost on the subsample is $\geq \frac{1}{n} \sum_{i=2}^{n-1} \frac{(n-1-i)}{n} \overset{n \rightarrow \infty}{\rightarrow} 1/2$.
Figure 4: Sensitivity of Goemans-Williamson (GW) with respect to the deletion of one node. GW gap is calculated by $\frac{{\mathsf{GW}}(G)}{{\mathsf{SDP}}(G)}$. This figure illustrates the GW gap distribution on the full Petersen network and a Petersen network with one node deleted. Notice that the GW gap differs drastically when we delete 1 node from a graph.
Figure 5: An example graph with different optimal SDP solutions leading to different GW output distribution. Figure \ref{['fig:nonunique_ex']} is an example graph with many distinct SDP optimal solutions. By definition, they all have the same GW SDP objective value. Figure \ref{['fig:nonunique_dist']} is the distribution of cut values returned by GW over two distinct optimal GW SDP solutions for graph in \ref{['fig:nonunique_ex']}, with 200 random hyperplanes sampled for each solution. Two optimal solutions result in very distinct GW output distribution.
...and 7 more figures

Theorems & Definitions (91)

Definition 2.1: Ashtiani15:RepresentationBalcan13:ClusteringBalcan17:Learning
Theorem 2.2
proof : Proof sketch
Theorem 2.3
Theorem 2.4
Definition 2.5
Lemma 2.5
proof : Proof sketch
Theorem 2.6
proof : Proof sketch
...and 81 more

Accelerating data-driven algorithm selection for combinatorial partitioning problems

TL;DR

Abstract

Accelerating data-driven algorithm selection for combinatorial partitioning problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (91)