Table of Contents
Fetching ...

Robust inference using density-powered Stein operators

Shinto Eguchi

TL;DR

The paper introduces the γ-Stein operator, a density-power weighted generalization of Stein operators derived from the γ-divergence, to enable robust, normalization-constant-free inference for unnormalized models. It develops a γ-score matching framework (γ-SME) that yields unbiased estimating functions independent of normalizers and, via a GMM approach, provides robust parameter estimation across diverse models. It further extends robustness to kernel-based goodness-of-fit testing (γ-KSD) and to robust variational inference (γ-SVGD), with empirical demonstrations on contaminated models including vMF, FB, normal mixtures, and a quartic potential. The approach is grounded in an information-geometric transport-variational calculus, offering practical procedures for robustness in energy-based models and Bayesian inference, along with principled γ-selection via anchored cross-validation. Overall, the γ-Stein framework unifies robustness, transport geometry, and score-based learning, delivering scalable, normalizer-free inference tools in the presence of outliers and misspecification.

Abstract

We introduce a density-power weighted variant for the Stein operator, called the $γ$-Stein operator. This is a novel class of operators derived from the $γ$-divergence, designed to build robust inference methods for unnormalized probability models. The operator's construction (weighting by the model density raised to a positive power $γ$ inherently down-weights the influence of outliers, providing a principled mechanism for robustness. Applying this operator yields a robust generalization of score matching that retains the crucial property of being independent of the model's normalizing constant. We extend this framework to develop two key applications: the $γ$-kernelized Stein discrepancy for robust goodness-of-fit testing, and $γ$-Stein variational gradient descent for robust Bayesian posterior approximation. Empirical results on contaminated Gaussian and quartic potential models show our methods significantly outperform standard baselines in both robustness and statistical efficiency.

Robust inference using density-powered Stein operators

TL;DR

The paper introduces the γ-Stein operator, a density-power weighted generalization of Stein operators derived from the γ-divergence, to enable robust, normalization-constant-free inference for unnormalized models. It develops a γ-score matching framework (γ-SME) that yields unbiased estimating functions independent of normalizers and, via a GMM approach, provides robust parameter estimation across diverse models. It further extends robustness to kernel-based goodness-of-fit testing (γ-KSD) and to robust variational inference (γ-SVGD), with empirical demonstrations on contaminated models including vMF, FB, normal mixtures, and a quartic potential. The approach is grounded in an information-geometric transport-variational calculus, offering practical procedures for robustness in energy-based models and Bayesian inference, along with principled γ-selection via anchored cross-validation. Overall, the γ-Stein framework unifies robustness, transport geometry, and score-based learning, delivering scalable, normalizer-free inference tools in the presence of outliers and misspecification.

Abstract

We introduce a density-power weighted variant for the Stein operator, called the -Stein operator. This is a novel class of operators derived from the -divergence, designed to build robust inference methods for unnormalized probability models. The operator's construction (weighting by the model density raised to a positive power inherently down-weights the influence of outliers, providing a principled mechanism for robustness. Applying this operator yields a robust generalization of score matching that retains the crucial property of being independent of the model's normalizing constant. We extend this framework to develop two key applications: the -kernelized Stein discrepancy for robust goodness-of-fit testing, and -Stein variational gradient descent for robust Bayesian posterior approximation. Empirical results on contaminated Gaussian and quartic potential models show our methods significantly outperform standard baselines in both robustness and statistical efficiency.

Paper Structure

This paper contains 14 sections, 6 theorems, 119 equations, 2 figures, 6 tables.

Key Result

Theorem 3

Let $\mu_{\gamma}(dx) \coloneqq p(x)q(x)^{\gamma}dx$ be a mixed weighting measure. The expectation of the $\gamma$-Stein operator under $p$ is the inner product of the score difference with $f$:

Figures (2)

  • Figure 1: The bimodal shape of the unnormalized quartic potential density $f_\theta(x)$ for $\theta=(\,0, 2, -0.5)$.
  • Figure 2: Posterior-predictive RMSE versus $\gamma$ under four scenarios (clean; $Y$-contamination; $X$-contamination; mixed $X{+}Y$). Error bars show $\pm$ one standard error over replicates.

Theorems & Definitions (16)

  • Definition 1: Stein operator
  • Definition 2: $\gamma$-Stein Operator
  • Theorem 3
  • proof
  • Proposition 4
  • proof
  • Proposition 5
  • proof
  • Remark 6: Independence from normalizing constant
  • Proposition 7
  • ...and 6 more