Composable Coresets for Constrained Determinant Maximization and Beyond
Sepideh Mahabadi, Thuy-Duong Vuong
TL;DR
This work develops composable coresets for determinant maximization under partition and laminar matroid constraints, plus unconstrained and broader experimental design settings. It introduces peeling coresets for the without-repetition case and extends directional height analysis to the $k\ge d$ regime, achieving size $kd$ with $d^{O(d)}$-approximation for $k> d$ and size $sk$ with $k^{2k}$-approximation for $k\le d$. The results generalize to strongly Rayleigh distributions and to other design objectives via spectral spanners, enabling near-linear-time pipelines and practical speedups for MAP-inference in partition settings. Lower bounds show tightness of the size and approximation trade-offs, while the laminar- and partition-matroid constructions yield scalable, composable summaries applicable to large-scale data summarization and design tasks.
Abstract
We study algorithms for construction of composable coresets for the task of Determinant Maximization under partition constraint. Given a point set $V\subset \mathbb{R}^d$ that is partitioned into $s$ groups $V_1,\cdots, V_s$, and integers $k_1,...,k_s$, where $k=\sum_i k_i$, the goal is to pick $k_i$ points from group $V_i$ such that the overall determinant of the picked $k$ points is maximized. Determinant Maximization and its constrained variants have gained a lot of interest for modeling diversity, and have found applications in the context of data summarization. When the cardinality $k$ of the selected set is greater than the dimension $d$, we show a peeling algorithm that gives us a composable coreset of size $kd$ with a provably optimal approximation factor of $d^{O(d)}.$ When $k\leq d$, we show a simple coreset construction with optimal size and approximation factor. As a further application of our technique, we get a composable coreset for determinant maximization under the more general laminar matroid constraints, and a composable coreset for unconstrained determinant maximization in a previously unresolved regime. Our results generalize to all strongly Rayleigh distributions and to several other experimental design problems. As an application, we improve the runtime of the practical local-search based algorithm of [Anari-Vuong--COLT'22] for determinantal maximization under partition constraint from $O(n^{2^s}k^{2^s})$ to $O(n k^{2^s})$, making it only linear on the number of points $n$.
