Density estimation from batched broken random samples

Hancheng Bi; Bernhard Schmitzer; Thilo D. Stier

Density estimation from batched broken random samples

Hancheng Bi, Bernhard Schmitzer, Thilo D. Stier

Abstract

The broken random sample problem was first introduced by DeGroot, Feder, and Gole (1971, Ann. Math. Statist.): in each observation (batch), a random sample of $M$ i.i.d. point pairs $ ((X_i,Y_i))_{i=1}^M$ is drawn from a joint distribution with density $p(x,y)$, but we can observe only the unordered multisets $(X_i)_{i=1}^M$ and $(Y_i)_{i=1}^M$ separately; that is, the pairing information is lost. For large $M$, inferring $p$ from a single observation has been shown to be essentially impossible. In this paper, we propose a parametric method based on a pseudo-log-likelihood to estimate $p$ from $N$ i.i.d. broken sample batches, and we prove a fast convergence rate in $N$ for our estimator that is uniform in $M$, under mild assumptions.

Density estimation from batched broken random samples

Abstract

The broken random sample problem was first introduced by DeGroot, Feder, and Gole (1971, Ann. Math. Statist.): in each observation (batch), a random sample of

i.i.d. point pairs

is drawn from a joint distribution with density

, but we can observe only the unordered multisets

and

separately; that is, the pairing information is lost. For large

, inferring

from a single observation has been shown to be essentially impossible. In this paper, we propose a parametric method based on a pseudo-log-likelihood to estimate

from

i.i.d. broken sample batches, and we prove a fast convergence rate in

for our estimator that is uniform in

, under mild assumptions.

Paper Structure (14 sections, 31 theorems, 71 equations, 4 figures)

This paper contains 14 sections, 31 theorems, 71 equations, 4 figures.

Introduction
Problem statement and related work
Problem statement.
Applications.
Outline
Problem description and main results
Preliminaries
Non-asymptotic convergence results
Concentration of $f_M^N(\theta)$
Concentration of $\nabla f_M^N(\theta)$
Convergence of the estimator
Numerical examples
Points colocalisation on 2D Torus
Covariance estimation for bivariate normal distribution

Key Result

Theorem 1.6

Figures (4)

Figure 1: Stimulated emission depletion (STED) microscopy image, part of dohrke2024puck. Cells were stained for the HA-tag (green) and Mic60 (purple).
Figure 2: Numerical experiment for $\sigma^* = 0.1$, the first row shows an example point cloud from a single batch. Each blue line depicts $f_M^N(\sigma)$ calculated from samples with $M$ and $N$ as denoted on the corresponding column / row. There are $50$ independent samples per plot. Orange points denote the minima and the histograms below show their distribution. The red line is $f_\infty(\sigma) = \frac{1}{2}||p^\sigma - p^{\sigma^*}||^2_{L^2} + \frac{1}{2} - \frac{1}{2}||p^{\sigma^*}||_{L^2}^2$.
Figure 3: The coefficient of variation of $\sigma_M^N$, computed from $100$ simulations, for varying values of $\sigma^*, N$, and $M$.
Figure 4: Numerical experiment with broken random samples from a bivariate normal distribution with $\rho^* = -0.5$. The first row shows an example from a single batch; the purple points are “unbroken” $(x,y)$ pairs, whereas in the broken-sample setting only the blue and red marginal points on the axes are observable. The remaining panels are analogous to \ref{['fig:the_plot']}, and the red curve is $f_\infty(\rho) = \tfrac{1}{2}\lVert p^\rho - p^{\rho^*}\rVert^2_{L^2(\mu\otimes\nu)} + \tfrac{1}{2} - \tfrac{1}{2}\lVert p^{\rho^*}\rVert^2_{L^2(\mu\otimes\nu)}$.

Theorems & Definitions (58)

Definition 1.2
Remark 1.3: Breaking the samples
Remark 1.4
Remark 1.5: Comparison with TransferOp
Theorem 1.6
Remark 1.7: Behaviour of minimizers of $f_M$
Theorem 1.8
Theorem 1.9
Theorem 2.1: Differentiability of parametrised integrals amann2009analysis
Definition 2.2: $\psi_1$-norm
...and 48 more

Density estimation from batched broken random samples

Abstract

Density estimation from batched broken random samples

Authors

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (58)