Table of Contents
Fetching ...

Predictive Coresets

Bernardo Flores

TL;DR

The paper addresses the scalability of Bayesian inference on massive datasets by replacing likelihood-based coreset weighting with matching posterior predictive distributions. It introduces predictive coresets, built via a DP-based proxy for posterior predictives and an optimal-transport–style transformation that maps full-data observations to a small weighted subset; the approach is model-agnostic and extends to nonparametric and non-Euclidean settings. The authors provide theoretical guarantees through posterior contraction rates in Wasserstein spaces and demonstrate the method on density estimation, logistic regression, and random partitions, with adaptive extensions to accelerate hyperparameter exploration. The combination of predictive-distribution matching, OT-based transport, and DP priors yields a flexible, scalable framework with practical benefits for large-scale Bayesian analysis and nontraditional data spaces. The work contributes a principled, transport-based coreset construction with convergence guarantees and actionable algorithms for real-world, complex Bayesian modeling tasks.

Abstract

Modern data analysis often involves massive datasets with hundreds of thousands of observations, making traditional inference algorithms computationally prohibitive. Coresets are selection methods designed to choose a smaller subset of observations while maintaining similar learning performance. Conventional coreset approaches determine these weights by minimizing the Kullback-Leibler (KL) divergence between the likelihood functions of the full and weighted datasets; as a result, this makes them ill-posed for nonparametric models, where the likelihood is often intractable. We propose an alternative variational method which employs randomized posteriors and finds weights to match the unknown posterior predictive distributions conditioned on the full and reduced datasets. Our approach provides a general algorithm based on predictive recursions suitable for nonparametric priors. We evaluate the performance of the proposed coreset construction on diverse problems, including random partitions and density estimation.

Predictive Coresets

TL;DR

The paper addresses the scalability of Bayesian inference on massive datasets by replacing likelihood-based coreset weighting with matching posterior predictive distributions. It introduces predictive coresets, built via a DP-based proxy for posterior predictives and an optimal-transport–style transformation that maps full-data observations to a small weighted subset; the approach is model-agnostic and extends to nonparametric and non-Euclidean settings. The authors provide theoretical guarantees through posterior contraction rates in Wasserstein spaces and demonstrate the method on density estimation, logistic regression, and random partitions, with adaptive extensions to accelerate hyperparameter exploration. The combination of predictive-distribution matching, OT-based transport, and DP priors yields a flexible, scalable framework with practical benefits for large-scale Bayesian analysis and nontraditional data spaces. The work contributes a principled, transport-based coreset construction with convergence guarantees and actionable algorithms for real-world, complex Bayesian modeling tasks.

Abstract

Modern data analysis often involves massive datasets with hundreds of thousands of observations, making traditional inference algorithms computationally prohibitive. Coresets are selection methods designed to choose a smaller subset of observations while maintaining similar learning performance. Conventional coreset approaches determine these weights by minimizing the Kullback-Leibler (KL) divergence between the likelihood functions of the full and weighted datasets; as a result, this makes them ill-posed for nonparametric models, where the likelihood is often intractable. We propose an alternative variational method which employs randomized posteriors and finds weights to match the unknown posterior predictive distributions conditioned on the full and reduced datasets. Our approach provides a general algorithm based on predictive recursions suitable for nonparametric priors. We evaluate the performance of the proposed coreset construction on diverse problems, including random partitions and density estimation.

Paper Structure

This paper contains 13 sections, 3 theorems, 22 equations, 5 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

Assume that $(\mathbb{X}, \mathtt{d})$ is a totally bounded metric space with packing number of order $N_\delta(\mathbb{X}, \mathtt{d})\sim \frac{1}{\delta^a}$ for some $a>0$ and small enough $\delta>0$. Then the following is a posterior contraction rate for the Dirichlet process with mean measure $ with $\epsilon_{n, q}(\mathbb{X}, p^0)$ being the $q$-Wasserstein rate of convergence of the Gliven

Figures (5)

  • Figure 1: The left plot shows a histogram of the difference in estimated KL divergence between the posterior means for two cases: one comparing the coreset to the full dataset, and the other comparing the unit coreset to the full dataset. The right plot shows the posterior mean densities for all three cases.
  • Figure 2: On the left is the results for a coreset of size 20; on the right the distance between the mean posterior logits for the coreset and the uniform subsample.
  • Figure 3: Data simulated from a mixture model, along with its induced clustering.
  • Figure 4: The panels show approximate posterior inference using a uniformly chosen core set (left), the proposed core set using Algorithm 3 (center) and full posterior inference (right).
  • Figure 5: Histogram of the differences in variation of information between the coreset and the full data, and the subsample and the full data.

Theorems & Definitions (6)

  • Definition 3.1
  • Theorem 1
  • Theorem 2
  • proof
  • Theorem 3
  • proof