The geometry of Stein's method of moments: A canonical decomposition via score matching

Mitsuki Nagai; Keisuke Yano

The geometry of Stein's method of moments: A canonical decomposition via score matching

Mitsuki Nagai, Keisuke Yano

Abstract

In this paper, we elucidate the geometry of Stein's method of moments (SMoM). SMoM is a parameter estimation method based on the Stein operator, and yields a wide class of estimators that do not depend on the normalizing constant. We present a canonical decomposition of an SMoM estimator after centering the score matching estimator, which sheds light on the central role of the score matching within the SMoM framework. Using this decomposition, we construct an SMoM estimator that improves upon the score matching estimator in the asymptotic variance. We also discuss the connection between SMoM and the Wasserstein geometry. Specifically, using the Wasserstein score function, we provide a geometrical interpretation of the gap in the asymptotic variance between the score matching estimator and the maximum likelihood estimator. Furthermore, it is shown that the score matching estimator is asymptotically efficient if and only if the Fisher score functions span the same space as the Wasserstein score functions.

The geometry of Stein's method of moments: A canonical decomposition via score matching

Abstract

Paper Structure (22 sections, 12 theorems, 103 equations, 6 figures, 3 tables)

This paper contains 22 sections, 12 theorems, 103 equations, 6 figures, 3 tables.

Introduction
Literature review and contributions
Organization
Preliminaries
Main results
A canonical decomposition of SMoM estimators
Improving asymptotic variance of score matching estimator
Connections between SMoM and the Wasserstein geometry
Numerical experiments
Generalized normal distribution
Polynomially tilted pairwise interaction model
Conclusion
Acknowledgement
Regularity conditions
Proofs in Section \ref{['sec:Wasserstein']}
...and 7 more sections

Key Result

Lemma 1

The SMoM estimator based on the test functions $f_{\theta,j}\coloneqq\nabla_{x}\partial_{\theta_j}\log q_{\theta}, \enspace j=1,\dots,d$ is the score matching estimator.

Figures (6)

Figure 1: Schematic diagram of the improvement of the asymptotic variance $\mathop{\mathrm{AVar}}\nolimits[\hat{\theta}_\mathrm{SM}]$ of the score matching estimator via SMoM. For $K=1$, the asymptotic variance $\mathop{\mathrm{AVar}}\nolimits[\hat{\theta}[\theta^\star]]$ is minimized at $c_1^\star$, and it is lower than $\mathop{\mathrm{AVar}}\nolimits[\hat{\theta}_\mathrm{SM}]$. For $K=2$, $\mathop{\mathrm{AVar}}\nolimits[\hat{\theta}_\mathrm{SM}]$ is further improved at $c^\star$. $\mathop{\mathrm{AVar}}\nolimits[\hat{\theta}[\theta^\star]]$ approaches the efficiency bound as increasing $K$.
Figure 2: The asymptotic relative efficiency $\mathop{\mathrm{AVar}}\nolimits[\hat{\theta}_\mathrm{MLE}]/\mathop{\mathrm{AVar}}\nolimits[\hat{\theta}_\mathrm{SM}]$ with respect to $\beta$ (blue) along with its limits (gray).
Figure 3: MSE ratio for $\hat{\theta}[\hat{\theta}_\mathrm{SM}]$ relative to $\hat{\theta}_\mathrm{SM}$ versus the geometric mean of estimates given by \ref{['eq:estimate_improvement']}. Points of the same color correspond to different pairs of $\tilde{v}_\alpha$. The horizontal and vertical line (gray) represent the asymptotic relative efficiency for MLE calculated by \ref{['eq:generalized_normal_ratio']}. The horizontal axis is truncated at 2. Values less than 1 on the horizontal axis indicate that corresponding SMoM estimator improves the variance of the score matching estimator. Points near the diagonal indicate that the estimate of the asymptotic relative efficiency is reliable.
Figure 4: Test functions for $\hat{\theta}_\mathrm{SM}$ (blue), $\hat{\theta}_\mathrm{MLE}$ (orange), and $\hat{\theta}[\hat{\theta}_\mathrm{SM}]$ (gray). For $\hat{\theta}[\hat{\theta}_\mathrm{SM}]$, the mean value over iterations are plotted, where each line corresponds to a different pairs of $\tilde{v}_\alpha$. A test function close to the test function of the MLE implies that the corresponding SMoM estimator is also close to the MLE.
Figure 5: MSE ratio of PPI model for $\hat{\theta}[\hat{\theta}_\mathrm{wSM}]$ relative to $\hat{\theta}_\mathrm{wSM}$ versus the geometric mean of estimates given by \ref{['eq:estimate_improvement_M']}. Points of the same color correspond to different pairs of $\tilde{v}_\alpha$. The horizontal axis is truncated at 2. Values less than 1 on the horizontal axis indicate that corresponding SMoM estimator improves the variance of the score matching estimator. Points near the diagonal indicate that the estimate of the asymptotic relative efficiency is reliable.
...and 1 more figures

Theorems & Definitions (35)

Lemma 1: Section 2.1 of Eguchi2025; Lemma 3.2 of Kume2026
proof
Remark 1: Test functions and gradient vector fields
Definition 1
Theorem 1: Canonical decomposition of SMoM estimator
Remark 2: $W$-orthogonality does not imply $\mathcal{A}_{\theta^\star}$-orthogonality
Lemma 2
proof
proof : Proof of Theorem \ref{['thm:smom_decomp']}
Example 1: Exponential families
...and 25 more

The geometry of Stein's method of moments: A canonical decomposition via score matching

Abstract

The geometry of Stein's method of moments: A canonical decomposition via score matching

Authors

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (35)