Table of Contents
Fetching ...

Density estimation from batched broken random samples

Hancheng Bi, Bernhard Schmitzer, Thilo D. Stier

Abstract

The broken random sample problem was first introduced by DeGroot, Feder, and Gole (1971, Ann. Math. Statist.): in each observation (batch), a random sample of $M$ i.i.d. point pairs $ ((X_i,Y_i))_{i=1}^M$ is drawn from a joint distribution with density $p(x,y)$, but we can observe only the unordered multisets $(X_i)_{i=1}^M$ and $(Y_i)_{i=1}^M$ separately; that is, the pairing information is lost. For large $M$, inferring $p$ from a single observation has been shown to be essentially impossible. In this paper, we propose a parametric method based on a pseudo-log-likelihood to estimate $p$ from $N$ i.i.d. broken sample batches, and we prove a fast convergence rate in $N$ for our estimator that is uniform in $M$, under mild assumptions.

Density estimation from batched broken random samples

Abstract

The broken random sample problem was first introduced by DeGroot, Feder, and Gole (1971, Ann. Math. Statist.): in each observation (batch), a random sample of i.i.d. point pairs is drawn from a joint distribution with density , but we can observe only the unordered multisets and separately; that is, the pairing information is lost. For large , inferring from a single observation has been shown to be essentially impossible. In this paper, we propose a parametric method based on a pseudo-log-likelihood to estimate from i.i.d. broken sample batches, and we prove a fast convergence rate in for our estimator that is uniform in , under mild assumptions.
Paper Structure (14 sections, 31 theorems, 71 equations, 4 figures)

This paper contains 14 sections, 31 theorems, 71 equations, 4 figures.

Key Result

Theorem 1.6

Figures (4)

  • Figure 1: Stimulated emission depletion (STED) microscopy image, part of dohrke2024puck. Cells were stained for the HA-tag (green) and Mic60 (purple).
  • Figure 2: Numerical experiment for $\sigma^* = 0.1$, the first row shows an example point cloud from a single batch. Each blue line depicts $f_M^N(\sigma)$ calculated from samples with $M$ and $N$ as denoted on the corresponding column / row. There are $50$ independent samples per plot. Orange points denote the minima and the histograms below show their distribution. The red line is $f_\infty(\sigma) = \frac{1}{2}||p^\sigma - p^{\sigma^*}||^2_{L^2} + \frac{1}{2} - \frac{1}{2}||p^{\sigma^*}||_{L^2}^2$.
  • Figure 3: The coefficient of variation of $\sigma_M^N$, computed from $100$ simulations, for varying values of $\sigma^*, N$, and $M$.
  • Figure 4: Numerical experiment with broken random samples from a bivariate normal distribution with $\rho^* = -0.5$. The first row shows an example from a single batch; the purple points are “unbroken” $(x,y)$ pairs, whereas in the broken-sample setting only the blue and red marginal points on the axes are observable. The remaining panels are analogous to \ref{['fig:the_plot']}, and the red curve is $f_\infty(\rho) = \tfrac{1}{2}\lVert p^\rho - p^{\rho^*}\rVert^2_{L^2(\mu\otimes\nu)} + \tfrac{1}{2} - \tfrac{1}{2}\lVert p^{\rho^*}\rVert^2_{L^2(\mu\otimes\nu)}$.

Theorems & Definitions (58)

  • Definition 1.2
  • Remark 1.3: Breaking the samples
  • Remark 1.4
  • Remark 1.5: Comparison with TransferOp
  • Theorem 1.6
  • Remark 1.7: Behaviour of minimizers of $f_M$
  • Theorem 1.8
  • Theorem 1.9
  • Theorem 2.1: Differentiability of parametrised integrals amann2009analysis
  • Definition 2.2: $\psi_1$-norm
  • ...and 48 more