Table of Contents
Fetching ...

Accurate Coresets for Latent Variable Models and Regularized Regression

Sanskar Ranjan, Supratim Shit

TL;DR

This paper introduces a unified framework for constructing accurate coresets, and presents accurate coreset construction algorithms for general problems, including a wide range of latent variable model problems and $\ell_p$-regularized $\ell_p$-regression.

Abstract

Accurate coresets are a weighted subset of the original dataset, ensuring a model trained on the accurate coreset maintains the same level of accuracy as a model trained on the full dataset. Primarily, these coresets have been studied for a limited range of machine learning models. In this paper, we introduce a unified framework for constructing accurate coresets. Using this framework, we present accurate coreset construction algorithms for general problems, including a wide range of latent variable model problems and $\ell_p$-regularized $\ell_p$-regression. For latent variable models, our coreset size is $O\left(\mathrm{poly}(k)\right)$, where $k$ is the number of latent variables. For $\ell_p$-regularized $\ell_p$-regression, our algorithm captures the reduction of model complexity due to regularization, resulting in a coreset whose size is always smaller than $d^{p}$ for a regularization parameter $λ> 0$. Here, $d$ is the dimension of the input points. This inherently improves the size of the accurate coreset for ridge regression. We substantiate our theoretical findings with extensive experimental evaluations on real datasets.

Accurate Coresets for Latent Variable Models and Regularized Regression

TL;DR

This paper introduces a unified framework for constructing accurate coresets, and presents accurate coreset construction algorithms for general problems, including a wide range of latent variable model problems and -regularized -regression.

Abstract

Accurate coresets are a weighted subset of the original dataset, ensuring a model trained on the accurate coreset maintains the same level of accuracy as a model trained on the full dataset. Primarily, these coresets have been studied for a limited range of machine learning models. In this paper, we introduce a unified framework for constructing accurate coresets. Using this framework, we present accurate coreset construction algorithms for general problems, including a wide range of latent variable model problems and -regularized -regression. For latent variable models, our coreset size is , where is the number of latent variables. For -regularized -regression, our algorithm captures the reduction of model complexity due to regularization, resulting in a coreset whose size is always smaller than for a regularization parameter . Here, is the dimension of the input points. This inherently improves the size of the accurate coreset for ridge regression. We substantiate our theoretical findings with extensive experimental evaluations on real datasets.
Paper Structure (22 sections, 9 theorems, 28 equations, 1 figure, 3 tables, 6 algorithms)

This paper contains 22 sections, 9 theorems, 28 equations, 1 figure, 3 tables, 6 algorithms.

Key Result

Theorem 1

Let $\mathbf{P}$ be a set of $n$ points in $\mathbb{R}^d$ and it spans $k$-dimensional space. If $\mathbf{x}$ is a point inside the convex hull of $\mathbf{P}$, then $\mathbf{x}$ is also in the convex hull of at most $k+1$ weighted points in $\mathbf{P}$.

Figures (1)

  • Figure 1: Accurate Coreset Size vs Regularization Parameter $\lambda$

Theorems & Definitions (18)

  • Theorem 1
  • Definition 1: $\mathrm{Kernelization}$
  • Lemma 2
  • proof
  • Theorem 3
  • proof
  • Lemma 4
  • proof
  • Theorem 5
  • proof
  • ...and 8 more