Table of Contents
Fetching ...

Score Matching for Estimating Finite Point Processes

Haoqun Cao, Yixuan Zhang, Feng Zhou

TL;DR

This work provides a mathematically rigorous framework for score matching on finite point processes using Janossy measures, revealing fundamental limitations of prior implicit/Autoregressive SM approaches in finite domains. It introduces a weighted score matching (WSM) and an autoregressive variant (AWSM) with provable consistency and finite-sample guarantees in parametric settings, and shows non-identifiability issues in nonparametric models. To address normalization ambiguities, the authors add a survival-classification augmentation that yields an entirely integration-free training objective applicable to intensity-based nonparametric models for spatio-temporal data. Empirically, AWSM achieves accuracy comparable to maximum likelihood estimation while offering significant computational efficiency across synthetic and real temporal and spatio-temporal datasets, including deep point-process models. The framework thus enables scalable, provably sound training of both classical and deep point processes, with practical benefits for diverse applications.

Abstract

Score matching estimators have garnered significant attention in recent years because they eliminate the need to compute normalizing constants, thereby mitigating the computational challenges associated with maximum likelihood estimation (MLE).While several studies have proposed score matching estimators for point processes, this work highlights the limitations of these existing methods, which stem primarily from the lack of a mathematically rigorous analysis of how score matching behaves on finite point processes -- special random configurations on bounded spaces where many of the usual assumptions and properties of score matching no longer hold. To this end, we develop a formal framework for score matching on finite point processes via Janossy measures and, within this framework, introduce an (autoregressive) weighted score-matching estimator, whose statistical properties we analyze in classical parametric settings. For general nonparametric (e.g., deep) point process models, we show that score matching alone does not uniquely identify the ground-truth distribution due to subtle normalization issues, and we propose a simple survival-classification augmentation that yields a complete, integration-free training objective for any intensity-based point process model for spatio-temporal case. Experiments on synthetic and real-world temporal and spatio-temporal datasets, demonstrate that our method accurately recovers intensities and achieves performance comparable to MLE with better efficiency.

Score Matching for Estimating Finite Point Processes

TL;DR

This work provides a mathematically rigorous framework for score matching on finite point processes using Janossy measures, revealing fundamental limitations of prior implicit/Autoregressive SM approaches in finite domains. It introduces a weighted score matching (WSM) and an autoregressive variant (AWSM) with provable consistency and finite-sample guarantees in parametric settings, and shows non-identifiability issues in nonparametric models. To address normalization ambiguities, the authors add a survival-classification augmentation that yields an entirely integration-free training objective applicable to intensity-based nonparametric models for spatio-temporal data. Empirically, AWSM achieves accuracy comparable to maximum likelihood estimation while offering significant computational efficiency across synthetic and real temporal and spatio-temporal datasets, including deep point-process models. The framework thus enables scalable, provably sound training of both classical and deep point processes, with practical benefits for diverse applications.

Abstract

Score matching estimators have garnered significant attention in recent years because they eliminate the need to compute normalizing constants, thereby mitigating the computational challenges associated with maximum likelihood estimation (MLE).While several studies have proposed score matching estimators for point processes, this work highlights the limitations of these existing methods, which stem primarily from the lack of a mathematically rigorous analysis of how score matching behaves on finite point processes -- special random configurations on bounded spaces where many of the usual assumptions and properties of score matching no longer hold. To this end, we develop a formal framework for score matching on finite point processes via Janossy measures and, within this framework, introduce an (autoregressive) weighted score-matching estimator, whose statistical properties we analyze in classical parametric settings. For general nonparametric (e.g., deep) point process models, we show that score matching alone does not uniquely identify the ground-truth distribution due to subtle normalization issues, and we propose a simple survival-classification augmentation that yields a complete, integration-free training objective for any intensity-based point process model for spatio-temporal case. Experiments on synthetic and real-world temporal and spatio-temporal datasets, demonstrate that our method accurately recovers intensities and achieves performance comparable to MLE with better efficiency.

Paper Structure

This paper contains 65 sections, 21 theorems, 141 equations, 9 figures, 6 tables.

Key Result

Proposition 4

(Proposition 7.2I in daley2003introduction) For a regular point process on $(0,T)\times S$, there exists a uniquely determined family of conditional probability density functions $p_n(t,\bm s|t_1,\bm s_1,\ldots, t_{n-1},\bm s_{n-1})$ with the exception of $p_1(t,\bm s)$ and associated survivor funct where $0<t_1<\ldots<t_{n-1}<t, (\bm s_1,\ldots, \bm s_{n-1})\in S^{(n-1)}$, $p_n(\cdot, \cdot|t_1,\

Figures (9)

  • Figure 1: Ground-truth and learned intensities on two temporal synthetic datasets. Rows: (1) Half-Sin, THP; (2) Exp-decay, THP; (3) Half-Sin, SAHP; (4) Exp-decay, SAHP. Columns: MLE, DSM, AWSM.
  • Figure 2: Ground-truth and learned intensities on two spatio-temporal synthetic datasets. Top: ground-truth intensity; Middle: learned intensities by MLE and DSM, Bottom: learned intensity by AWSM.
  • Figure 3: Snapshots of the conditional intensity learned by SMASH trained with MLE and AWSM on the Earthquake and CitiBike datasets. From left to right: intensity at different timestamps. Observed events are marked with “$\times$”, whose influence decays over time; brighter regions indicate higher intensity.
  • Figure 4: Learned intensities from THP with MLE and AWSM objectives on Retweet dataset.
  • Figure 5: Average TLL versus RT for different choices of integration nodes. In each panel, the blue solid line with circular markers corresponds to the MLE estimator, and the red triangle corresponds to the AWSM estimator; error bars indicate one standard deviation over repeated runs. The text labels next to the blue markers indicate the integration-node configuration. For example, $2^2+2$ means using two spatial quadrature nodes in each spatial dimension and two temporal quadrature nodes between any two consecutive event times.
  • ...and 4 more figures

Theorems & Definitions (38)

  • Definition 1
  • Definition 2
  • Definition 3
  • Proposition 4
  • Example 1
  • Definition 5
  • Definition 6
  • Theorem 9
  • Theorem 10
  • Example 2
  • ...and 28 more