Table of Contents
Fetching ...

Approximate Stein Classes for Truncated Density Estimation

Daniel J. Williams, Song Liu

TL;DR

This work tackles truncated density estimation where the normalising constant is intractable and the boundary is unknown. It introduces approximate Stein classes and the truncated kernelised Stein discrepancy (TKSD), which can be evaluated from boundary samples without a fixed boundary weighting function. By formulating TKSD via a Lagrangian dual and minimising it, the authors estimate unnormalised truncated densities and prove consistency under mild conditions. Theoretical analysis is complemented by experiments showing TKSD competitive with existing methods while requiring fewer boundary details, and practical demonstrations on complex boundaries such as the United States border. Overall, the approach provides a data-driven, boundary-agnostic framework for truncated density estimation with solid theoretical guarantees and practical effectiveness.

Abstract

Estimating truncated density models is difficult, as these models have intractable normalising constants and hard to satisfy boundary conditions. Score matching can be adapted to solve the truncated density estimation problem, but requires a continuous weighting function which takes zero at the boundary and is positive elsewhere. Evaluation of such a weighting function (and its gradient) often requires a closed-form expression of the truncation boundary and finding a solution to a complicated optimisation problem. In this paper, we propose approximate Stein classes, which in turn leads to a relaxed Stein identity for truncated density estimation. We develop a novel discrepancy measure, truncated kernelised Stein discrepancy (TKSD), which does not require fixing a weighting function in advance, and can be evaluated using only samples on the boundary. We estimate a truncated density model by minimising the Lagrangian dual of TKSD. Finally, experiments show the accuracy of our method to be an improvement over previous works even without the explicit functional form of the boundary.

Approximate Stein Classes for Truncated Density Estimation

TL;DR

This work tackles truncated density estimation where the normalising constant is intractable and the boundary is unknown. It introduces approximate Stein classes and the truncated kernelised Stein discrepancy (TKSD), which can be evaluated from boundary samples without a fixed boundary weighting function. By formulating TKSD via a Lagrangian dual and minimising it, the authors estimate unnormalised truncated densities and prove consistency under mild conditions. Theoretical analysis is complemented by experiments showing TKSD competitive with existing methods while requiring fewer boundary details, and practical demonstrations on complex boundaries such as the United States border. Overall, the approach provides a data-driven, boundary-agnostic framework for truncated density estimation with solid theoretical guarantees and practical effectiveness.

Abstract

Estimating truncated density models is difficult, as these models have intractable normalising constants and hard to satisfy boundary conditions. Score matching can be adapted to solve the truncated density estimation problem, but requires a continuous weighting function which takes zero at the boundary and is positive elsewhere. Evaluation of such a weighting function (and its gradient) often requires a closed-form expression of the truncation boundary and finding a solution to a complicated optimisation problem. In this paper, we propose approximate Stein classes, which in turn leads to a relaxed Stein identity for truncated density estimation. We develop a novel discrepancy measure, truncated kernelised Stein discrepancy (TKSD), which does not require fixing a weighting function in advance, and can be evaluated using only samples on the boundary. We estimate a truncated density model by minimising the Lagrangian dual of TKSD. Finally, experiments show the accuracy of our method to be an improvement over previous works even without the explicit functional form of the boundary.
Paper Structure (40 sections, 8 theorems, 109 equations, 10 figures)

This paper contains 40 sections, 8 theorems, 109 equations, 10 figures.

Key Result

Lemma 5.1

Let $q$ be a smooth density supported on $V$. For any $\boldsymbol{g} \in \mathcal{G}^d_0$, then

Figures (10)

  • Figure 1: Density estimation when the truncation boundary is the border of the U.S., as described in \ref{['sec:usa']}. Top: example of increasing the number of boundary points $m$. Bottom: across 256 seeds for each value of $m$, mean estimation error with standard error bars for the mean of a 2D Gaussian, for TKSD and TruncSM as $m$ increases.
  • Figure 2: Mean estimation error across 256 seeds, with standard error bars, as dimension $d$ increases (left) and runtime for each method (right). The truncation domain is the $\ell_2$ ball of radius $d^{0.53}$ (top) and $\ell_1$ ball of radius $d$ (bottom).
  • Figure 3: Contour lines of the optimal $\tilde{g}_0$ (first dimension of $\tilde{\boldsymbol{g}}$) output by TKSD across different values of $m$ and differently shaped truncation boundaries: the $\ell_1$ ball (top), the $\ell_2$ ball (middle) and a heart shape (bottom). Red points are all $m$ points in $\widetilde{\partial V}$ and grey points are samples from the truncated dataset. Note that we plot only the first dimension of $\tilde{\boldsymbol{g}}$ (i.e. $g_1$), but we observe the same pattern with the second dimension.
  • Figure 4: Lower bound on $\varepsilon_m$ (given in \ref{['eq:epsilon']}) against $m$, the number of finite boundary points, plotted for different values of fixed dimension $d$ and boundary 'size' $L(V)$, which scales quadratically.
  • Figure 5: Left: Estimation error as $n$ and $m$ increases for TKSD only. Right: Mean estimation error for the three methods: TKSD, TruncSM and bd-KSD, with standard error bars. TKSD uses a fixed $m=32$ across all values of $n$. Both plots report statistics over 64 seeds.
  • ...and 5 more figures

Theorems & Definitions (13)

  • Definition 3.1
  • Definition 4.1
  • Lemma 5.1
  • Lemma 5.2
  • Proposition 5.3
  • Remark 5.4
  • Theorem 5.5
  • Theorem 5.6
  • Theorem 5.7
  • Theorem 5.10
  • ...and 3 more