Table of Contents
Fetching ...

The geometry of Stein's method of moments: A canonical decomposition via score matching

Mitsuki Nagai, Keisuke Yano

Abstract

In this paper, we elucidate the geometry of Stein's method of moments (SMoM). SMoM is a parameter estimation method based on the Stein operator, and yields a wide class of estimators that do not depend on the normalizing constant. We present a canonical decomposition of an SMoM estimator after centering the score matching estimator, which sheds light on the central role of the score matching within the SMoM framework. Using this decomposition, we construct an SMoM estimator that improves upon the score matching estimator in the asymptotic variance. We also discuss the connection between SMoM and the Wasserstein geometry. Specifically, using the Wasserstein score function, we provide a geometrical interpretation of the gap in the asymptotic variance between the score matching estimator and the maximum likelihood estimator. Furthermore, it is shown that the score matching estimator is asymptotically efficient if and only if the Fisher score functions span the same space as the Wasserstein score functions.

The geometry of Stein's method of moments: A canonical decomposition via score matching

Abstract

In this paper, we elucidate the geometry of Stein's method of moments (SMoM). SMoM is a parameter estimation method based on the Stein operator, and yields a wide class of estimators that do not depend on the normalizing constant. We present a canonical decomposition of an SMoM estimator after centering the score matching estimator, which sheds light on the central role of the score matching within the SMoM framework. Using this decomposition, we construct an SMoM estimator that improves upon the score matching estimator in the asymptotic variance. We also discuss the connection between SMoM and the Wasserstein geometry. Specifically, using the Wasserstein score function, we provide a geometrical interpretation of the gap in the asymptotic variance between the score matching estimator and the maximum likelihood estimator. Furthermore, it is shown that the score matching estimator is asymptotically efficient if and only if the Fisher score functions span the same space as the Wasserstein score functions.
Paper Structure (22 sections, 12 theorems, 103 equations, 6 figures, 3 tables)

This paper contains 22 sections, 12 theorems, 103 equations, 6 figures, 3 tables.

Key Result

Lemma 1

The SMoM estimator based on the test functions $f_{\theta,j}\coloneqq\nabla_{x}\partial_{\theta_j}\log q_{\theta}, \enspace j=1,\dots,d$ is the score matching estimator.

Figures (6)

  • Figure 1: Schematic diagram of the improvement of the asymptotic variance $\mathop{\mathrm{AVar}}\nolimits[\hat{\theta}_\mathrm{SM}]$ of the score matching estimator via SMoM. For $K=1$, the asymptotic variance $\mathop{\mathrm{AVar}}\nolimits[\hat{\theta}[\theta^\star]]$ is minimized at $c_1^\star$, and it is lower than $\mathop{\mathrm{AVar}}\nolimits[\hat{\theta}_\mathrm{SM}]$. For $K=2$, $\mathop{\mathrm{AVar}}\nolimits[\hat{\theta}_\mathrm{SM}]$ is further improved at $c^\star$. $\mathop{\mathrm{AVar}}\nolimits[\hat{\theta}[\theta^\star]]$ approaches the efficiency bound as increasing $K$.
  • Figure 2: The asymptotic relative efficiency $\mathop{\mathrm{AVar}}\nolimits[\hat{\theta}_\mathrm{MLE}]/\mathop{\mathrm{AVar}}\nolimits[\hat{\theta}_\mathrm{SM}]$ with respect to $\beta$ (blue) along with its limits (gray).
  • Figure 3: MSE ratio for $\hat{\theta}[\hat{\theta}_\mathrm{SM}]$ relative to $\hat{\theta}_\mathrm{SM}$ versus the geometric mean of estimates given by \ref{['eq:estimate_improvement']}. Points of the same color correspond to different pairs of $\tilde{v}_\alpha$. The horizontal and vertical line (gray) represent the asymptotic relative efficiency for MLE calculated by \ref{['eq:generalized_normal_ratio']}. The horizontal axis is truncated at 2. Values less than 1 on the horizontal axis indicate that corresponding SMoM estimator improves the variance of the score matching estimator. Points near the diagonal indicate that the estimate of the asymptotic relative efficiency is reliable.
  • Figure 4: Test functions for $\hat{\theta}_\mathrm{SM}$ (blue), $\hat{\theta}_\mathrm{MLE}$ (orange), and $\hat{\theta}[\hat{\theta}_\mathrm{SM}]$ (gray). For $\hat{\theta}[\hat{\theta}_\mathrm{SM}]$, the mean value over iterations are plotted, where each line corresponds to a different pairs of $\tilde{v}_\alpha$. A test function close to the test function of the MLE implies that the corresponding SMoM estimator is also close to the MLE.
  • Figure 5: MSE ratio of PPI model for $\hat{\theta}[\hat{\theta}_\mathrm{wSM}]$ relative to $\hat{\theta}_\mathrm{wSM}$ versus the geometric mean of estimates given by \ref{['eq:estimate_improvement_M']}. Points of the same color correspond to different pairs of $\tilde{v}_\alpha$. The horizontal axis is truncated at 2. Values less than 1 on the horizontal axis indicate that corresponding SMoM estimator improves the variance of the score matching estimator. Points near the diagonal indicate that the estimate of the asymptotic relative efficiency is reliable.
  • ...and 1 more figures

Theorems & Definitions (35)

  • Lemma 1: Section 2.1 of Eguchi2025; Lemma 3.2 of Kume2026
  • proof
  • Remark 1: Test functions and gradient vector fields
  • Definition 1
  • Theorem 1: Canonical decomposition of SMoM estimator
  • Remark 2: $W$-orthogonality does not imply $\mathcal{A}_{\theta^\star}$-orthogonality
  • Lemma 2
  • proof
  • proof : Proof of Theorem \ref{['thm:smom_decomp']}
  • Example 1: Exponential families
  • ...and 25 more