Table of Contents
Fetching ...

PCA for Point Processes

Franck Picard, Vincent Rivoirard, Angelina Roche, Victor Panaretos

TL;DR

PCA for Point Processes develops a population-level, functional PCA framework for replicated point processes by embedding each realization as a random measure via its cumulative mass function. It establishes a Karhunen–Loève expansion for these measures and a Mercer representation for the covariance measure, introducing principal measures that govern latent dynamics. The approach yields explicit eigenstructure results for Poisson and Hawkes processes and provides a fully data-driven, smoothing-free estimator with parametric convergence rates, validated through simulations and diverse applications. The methodology enables interpretable dimension reduction and visualization of variability in replicated point patterns, with broad applicability to seismology, single-cell epigenomics, and neuroscience, and is implemented in the R package pppca.

Abstract

We introduce a novel statistical framework for the analysis of replicated point processes that allows for the study of point pattern variability at a population level. By treating point process realizations as random measures, we adopt a functional analysis perspective and propose a form of functional Principal Component Analysis (fPCA) for point processes. The originality of our method is to base our analysis on the cumulative mass functions of the random measures which gives us a direct and interpretable analysis. Key theoretical contributions include establishing a Karhunen-Loève expansion for the random measures and a Mercer Theorem for covariance measures. We establish convergence in a strong sense, and introduce the concept of principal measures, which can be seen as latent processes governing the dynamics of the observed point patterns. We propose an easy-to-implement estimation strategy of eigenelements for which parametric rates are achieved. We fully characterize the solutions of our approach to Poisson and Hawkes processes and validate our methodology via simulations and diverse applications in seismology, single-cell biology and neurosiences, demonstrating its versatility and effectiveness. Our method is implemented in the pppca R-package.

PCA for Point Processes

TL;DR

PCA for Point Processes develops a population-level, functional PCA framework for replicated point processes by embedding each realization as a random measure via its cumulative mass function. It establishes a Karhunen–Loève expansion for these measures and a Mercer representation for the covariance measure, introducing principal measures that govern latent dynamics. The approach yields explicit eigenstructure results for Poisson and Hawkes processes and provides a fully data-driven, smoothing-free estimator with parametric convergence rates, validated through simulations and diverse applications. The methodology enables interpretable dimension reduction and visualization of variability in replicated point patterns, with broad applicability to seismology, single-cell epigenomics, and neuroscience, and is implemented in the R package pppca.

Abstract

We introduce a novel statistical framework for the analysis of replicated point processes that allows for the study of point pattern variability at a population level. By treating point process realizations as random measures, we adopt a functional analysis perspective and propose a form of functional Principal Component Analysis (fPCA) for point processes. The originality of our method is to base our analysis on the cumulative mass functions of the random measures which gives us a direct and interpretable analysis. Key theoretical contributions include establishing a Karhunen-Loève expansion for the random measures and a Mercer Theorem for covariance measures. We establish convergence in a strong sense, and introduce the concept of principal measures, which can be seen as latent processes governing the dynamics of the observed point patterns. We propose an easy-to-implement estimation strategy of eigenelements for which parametric rates are achieved. We fully characterize the solutions of our approach to Poisson and Hawkes processes and validate our methodology via simulations and diverse applications in seismology, single-cell biology and neurosiences, demonstrating its versatility and effectiveness. Our method is implemented in the pppca R-package.
Paper Structure (26 sections, 12 theorems, 226 equations, 8 figures)

This paper contains 26 sections, 12 theorems, 226 equations, 8 figures.

Key Result

Proposition 3.1

We suppose that Assumptions eq:hyp2 and lambdajpositive are satisfied and that $K_\Delta$ is continuous. Then, for all $j\geq 1$, the derivative in the distributional sense of $\eta_j$ is a measure, denoted by $\mu_j$, that verifies and, for all $\varphi\in\mathcal{H}_0^1=\{f\in \mathbb L^2(I):f'\in \mathbb L^2(I) \text{ and }f(t) = 0 \text{ for all } t\notin I \}$,

Figures (8)

  • Figure 1: Average eigenfunctions for Poisson processes with different intensity functions over 50 replicates.
  • Figure 2: Average eigenvalues (log-scale) for Poisson Processes over 50 replicated. Each dot corresponds to a value of $j \in \{1,\hdots, 10\}$. The empirical average is plotted vs the expected theoretical asymptotic regime of eigenvalues in $(\int_0^1 \sqrt{w(u)}du)^2/(j \pi - \pi/2)^{2}$, as expected from Theorem \ref{['thm:inhPP']}. Note that Theorem \ref{['thm:inhPP']} provides a $(j \pi)^{-2}$ regime. The black line corresponds to the first bisector, so that the points align if the empirical convergence matches the theoretical regime.
  • Figure 3: Average eigenfunctions for Hawkes Processes with different transfert functions over 50 replicates. Dotted lines: asymptotic eigenfunctions $t\longmapsto\sqrt{2}\sin(\pi(2j-1)t/2)$ (see Theorem \ref{['Hawkes-ed-sol']}).
  • Figure 4: Average eigenvalues (log-scale) for Hawkes Processes over 50 replicated. Each dot corresponds to a value of $j \in \{1,\hdots, 50\}$. The empirical average is plotted vs the expected theoretical asymptotic regime of eigenvalues in $w_1/(j \pi - \pi/2)^{2}$, as expected from Theorem \ref{['Hawkes-ed-sol']}. The black line corresponds to the first bisector, so that the points align if the empirical convergence matches the theoretical regime.
  • Figure 5: A: Raster plot of earthquakes during the period 2013-2023. Each line corresponds to a city and each dot to an earthquake occurrence. Breakpoint dates (grey vertical lines) correspond to 2017.07.16 and 2020.11.01 B: Percentage of variance according to the number of eigenelements (log scale). C: First rescaled eigenfunction $\sqrt{\widehat{\lambda}_1}\widehat{\eta}_1$ according to the date of earthquakes (left). Breakpoint dates (grey vertical lines) correspond to 2017.07.16 and 2020.11.01. D: Plot of PCA scores for the first axis $\widehat{\xi}_{i1}$ according to the number of occurrences $W([0,t])$.
  • ...and 3 more figures

Theorems & Definitions (24)

  • Proposition 3.1
  • Theorem 3.2: Karhunen-Loève Theorem for point processes
  • Remark 3.3
  • Theorem 3.4: Mercer's Theorem for $C_\Delta$
  • Theorem 4.1
  • Remark 4.2
  • Remark 4.3
  • Theorem 4.4: Theorems 4.3.1 and 4.6.2 of Zettl-book
  • Remark 4.5
  • Definition 4.6
  • ...and 14 more