Composable Coresets for Constrained Determinant Maximization and Beyond

Sepideh Mahabadi; Thuy-Duong Vuong

Composable Coresets for Constrained Determinant Maximization and Beyond

Sepideh Mahabadi, Thuy-Duong Vuong

TL;DR

This work develops composable coresets for determinant maximization under partition and laminar matroid constraints, plus unconstrained and broader experimental design settings. It introduces peeling coresets for the without-repetition case and extends directional height analysis to the $k\ge d$ regime, achieving size $kd$ with $d^{O(d)}$-approximation for $k> d$ and size $sk$ with $k^{2k}$-approximation for $k\le d$. The results generalize to strongly Rayleigh distributions and to other design objectives via spectral spanners, enabling near-linear-time pipelines and practical speedups for MAP-inference in partition settings. Lower bounds show tightness of the size and approximation trade-offs, while the laminar- and partition-matroid constructions yield scalable, composable summaries applicable to large-scale data summarization and design tasks.

Abstract

We study algorithms for construction of composable coresets for the task of Determinant Maximization under partition constraint. Given a point set $V\subset \mathbb{R}^d$ that is partitioned into $s$ groups $V_1,\cdots, V_s$, and integers $k_1,...,k_s$, where $k=\sum_i k_i$, the goal is to pick $k_i$ points from group $V_i$ such that the overall determinant of the picked $k$ points is maximized. Determinant Maximization and its constrained variants have gained a lot of interest for modeling diversity, and have found applications in the context of data summarization. When the cardinality $k$ of the selected set is greater than the dimension $d$, we show a peeling algorithm that gives us a composable coreset of size $kd$ with a provably optimal approximation factor of $d^{O(d)}.$ When $k\leq d$, we show a simple coreset construction with optimal size and approximation factor. As a further application of our technique, we get a composable coreset for determinant maximization under the more general laminar matroid constraints, and a composable coreset for unconstrained determinant maximization in a previously unresolved regime. Our results generalize to all strongly Rayleigh distributions and to several other experimental design problems. As an application, we improve the runtime of the practical local-search based algorithm of [Anari-Vuong--COLT'22] for determinantal maximization under partition constraint from $O(n^{2^s}k^{2^s})$ to $O(n k^{2^s})$, making it only linear on the number of points $n$.

Composable Coresets for Constrained Determinant Maximization and Beyond

TL;DR

regime, achieving size

with

-approximation for

and size

with

-approximation for

. The results generalize to strongly Rayleigh distributions and to other design objectives via spectral spanners, enabling near-linear-time pipelines and practical speedups for MAP-inference in partition settings. Lower bounds show tightness of the size and approximation trade-offs, while the laminar- and partition-matroid constructions yield scalable, composable summaries applicable to large-scale data summarization and design tasks.

Abstract

We study algorithms for construction of composable coresets for the task of Determinant Maximization under partition constraint. Given a point set

that is partitioned into

groups

, and integers

, where

, the goal is to pick

points from group

such that the overall determinant of the picked

points is maximized. Determinant Maximization and its constrained variants have gained a lot of interest for modeling diversity, and have found applications in the context of data summarization. When the cardinality

of the selected set is greater than the dimension

, we show a peeling algorithm that gives us a composable coreset of size

with a provably optimal approximation factor of

When

, we show a simple coreset construction with optimal size and approximation factor. As a further application of our technique, we get a composable coreset for determinant maximization under the more general laminar matroid constraints, and a composable coreset for unconstrained determinant maximization in a previously unresolved regime. Our results generalize to all strongly Rayleigh distributions and to several other experimental design problems. As an application, we improve the runtime of the practical local-search based algorithm of [Anari-Vuong--COLT'22] for determinantal maximization under partition constraint from

, making it only linear on the number of points

Paper Structure (16 sections, 19 theorems, 53 equations, 1 table)

This paper contains 16 sections, 19 theorems, 53 equations, 1 table.

Introduction
Our Results
Lower bounds.
Application.
Overview of the Techniques
Preliminaries
Matroids
Determinant Maximization and Experimental Design Problems
Composable Coresets
Directional Height
Strongly Rayleigh Distribution and Exchange Inequalities
Unconstrained Case: the Peeling Coreset
The Algorithm
Composable Coresets for Partition and Laminar Matroids
Other Experimental Design Problems
...and 1 more sections

Key Result

Theorem 2.5

Let $k \leq d$ and $V\subseteq\mathbb{R}^d$. Then any size $k$ local optimum $U$ w.r.t $\det(\cdot)$ inside $V$ approximately preserves the $k$-directional height. That is, for any $(k-1)$-dimensional subspace $H$ where for a point set $P$, we define $d(P,H)=\max_{p\in P} d(p,H)$.

Theorems & Definitions (51)

Definition 1.1: $\det_k$
Definition 2.1: Local optima
Definition 2.2: Partition matroid
Definition 2.3: Laminar matroid
Definition 2.4: Directional height and $k$-directional height mahabadi2019composable
Theorem 2.5: Coreset for $k$-directional height mahabadi2019composable
Definition 2.6: Strongly Rayleigh
Lemma 2.7
Definition 3.1: Value-preserving set
Lemma 3.2
...and 41 more

Composable Coresets for Constrained Determinant Maximization and Beyond

TL;DR

Abstract

Composable Coresets for Constrained Determinant Maximization and Beyond

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (51)