Distribution Testing Meets Sum Estimation
Pinki Pradhan, Sampriti Roy
TL;DR
The paper addresses estimating the total weight $W=\sum_{i=1}^n w(i)$ of a universe with non-increasing weights, achieving a $(1\pm \epsilon)$-approximation using sublinear samples under non-adaptive conditional sampling. It leverages conditional and uniform sampling within oblivious interval partitions to identify structure in the weight distribution and to bound estimation error, with extensions to unimodal distributions by partitioning at the minimum. The authors deliver concrete algorithmic guarantees: for monotone weights, $O\left( \frac{1}{\epsilon^3}\log n + \frac{1}{\epsilon^6} \right)$ weighted conditional samples and $O\left( \frac{1}{\epsilon^3}\log n \right)$ uniform conditional samples, matching the $\Omega(\log n)$ lower bound from prior work; for unimodal cases, the same order applies plus $O(\log n)$ evaluation queries to locate the minimum. They further study support-size estimation under a hybrid model that combines weighted and uniform access, achieving accurate estimates with polylogarithmic sample requirements. Overall, the work advances understanding of sum estimation under restricted access models in structured distributions and provides practical sublinear algorithms with provable guarantees.
Abstract
We study the problem of estimating the sum of $n$ elements, each with weight $w(i)$, in a structured universe. Our goal is to estimate $W = \sum_{i=1}^n w(i)$ within a $(1 \pm ε)$ factor using a sublinear number of samples, assuming weights are non-increasing, i.e., $w(1) \geq w(2) \geq \dots \geq w(n)$. The sum estimation problem is well-studied under different access models to the universe $U$. However, to the best of our knowledge, nothing is known about the sum estimation problem using non-adaptive conditional sampling. In this work, we explore the sum estimation problem using non-adaptive conditional weighted and non-adaptive conditional uniform samples, assuming that the underlying distribution ($D(i)=w(i)/W$) is monotone. We also extend our approach to to the case where the underlying distribution of $U$ is unimodal. Additionally, we consider support size estimation when $w(i) = 0$ or $w(i) \geq W/n$, using hybrid sampling (both weighted and uniform) to access $U$. We propose an algorithm to estimate $W$ under the non-increasing weight assumption, using $O(\frac{1}{ε^3} \log{n} + \frac{1}{ε^6})$ non-adaptive weighted conditional samples and $O(\frac{1}{ε^3} \log{n})$ uniform conditional samples. Our algorithm matches the $Ω(\log{n})$ lower bound by \cite{ACK15}. For unimodal distributions, the sample complexity remains similar, with an additional $O(\log{n})$ evaluation queries to locate the minimum weighted point in the domain. For estimating the support size $k$ of $U$, where weights are either $0$ or at least $W/n$, our algorithm uses $O\big( \frac{\log^3(n/ε)}{ε^8} \cdot \log^4 \frac{\log(n/ε)}ε \big)$ uniform samples and $O\big( \frac{\log(n/ε)}{ε^2} \cdot \log \frac{\log(n/ε)}ε \big)$ weighted samples to output $\hat{k}$ satisfying $k - 2εn \leq \hat{k} \leq k + εn$.
