Tighter Confidence Intervals under Without Replacement Sampling via Empirical Rate Functions

Shubhanshu Shekhar; Aaditya Ramdas

Tighter Confidence Intervals under Without Replacement Sampling via Empirical Rate Functions

Shubhanshu Shekhar, Aaditya Ramdas

Abstract

We consider the problem of constructing confidence intervals (CIs) for the population mean of $N$ values $\{x_1, \ldots, x_N\} \subset Σ^N$ based on a random sample of size $n$, denoted by $X^n \equiv (X_1, \ldots, X_n)$, drawn uniformly without replacement (WoR). We begin by focusing on the finite alphabet ($|Σ| = k <\infty$) and moderate accuracy ($\log(1/α_N) \gg (k+1)\log N$) regime, and derive a fundamental lower bound on the width of any level-$(1-α_N)$ CI in terms of the inverse of the WoR rate functions from the theory of large deviations. Guided by this lower bound, we propose a new level-$(1-α_N)$ CI using an empirical inverse rate function, and show that in certain asymptotic regimes the width of this CI matches the lower bound up to constants. We also derive a dual formulation of the inverse rate function that enables efficient computation of our proposed CI. We then move beyond the finite alphabet case and use a Bernoulli coupling idea to construct an almost sure CI for $Σ= [0,1]$, and a conceptually simple nonasymptotic CI for the case of $Σ$ being a $(2,D)$ smooth Banach space. For both finite and general alphabets, our results employ classical large deviation techniques in novel ways, thus establishing new connections between estimation under WoR sampling and the theory of large deviations.

Tighter Confidence Intervals under Without Replacement Sampling via Empirical Rate Functions

Abstract

We consider the problem of constructing confidence intervals (CIs) for the population mean of

values

based on a random sample of size

, denoted by

, drawn uniformly without replacement (WoR). We begin by focusing on the finite alphabet (

) and moderate accuracy (

) regime, and derive a fundamental lower bound on the width of any level-

CI in terms of the inverse of the WoR rate functions from the theory of large deviations. Guided by this lower bound, we propose a new level-

CI using an empirical inverse rate function, and show that in certain asymptotic regimes the width of this CI matches the lower bound up to constants. We also derive a dual formulation of the inverse rate function that enables efficient computation of our proposed CI. We then move beyond the finite alphabet case and use a Bernoulli coupling idea to construct an almost sure CI for

, and a conceptually simple nonasymptotic CI for the case of

being a

smooth Banach space. For both finite and general alphabets, our results employ classical large deviation techniques in novel ways, thus establishing new connections between estimation under WoR sampling and the theory of large deviations.

Paper Structure (49 sections, 17 theorems, 160 equations, 3 figures)

This paper contains 49 sections, 17 theorems, 160 equations, 3 figures.

Introduction
Preliminaries
Prior Work.
Overview of Our Results.
Main Results
Lower Bound for Finite Alphabet
Proposed CI for Finite Alphabet
A Duality Result.
CIs for General Alphabets
An "Almost Sure" CI for Sigma=[0,1]
WoR CI in Smooth Banach Spaces
Comparison with schneider2016probability.
Conclusion
Additional Background
List of Key Symbols
...and 34 more sections

Key Result

Theorem 3.4

Let $\mathcal{C}$ denote a method of constructing confidence intervals for the population mean $\mu_N$ of $\mathcal{X}_N = \{x_1, \ldots, x_N\} \subset \Sigma^N$. With $\widehat{P}_N \equiv \widehat{P}(\mathcal{X}_N) \coloneqq \frac{1}{N} \sum_{x \in \mathcal{X}_N} \delta_{x}$, define Let $\widetilde{P}_n = \mathop{\mathrm{argmin}}\limits_{t \in \mathcal{T}_n} I(t, \beta, \widehat{P}_N)$ denote t

Figures (3)

Figure 1: Comparison of the distribution of widths (over $200$ trials) of the almost sure CI of \ref{['eq:asympCI-def']} (denoted by AS-CI), the CLT-based asymptotically valid CI, and the empirical Bernstein CI derived by bardenet2015concentration using populations of size $N=1000$ generated using $\mathrm{Beta}(a,b)$ distributions, with WoR samples of size $500$. In both examples the size of our almost sure CI lies midway between the asymptotically valid CLT-CI, and the nonasymptotic CI of bardenet2015concentration, denoted by BM2015.
Figure 3: The first figure numerically verifies the correctness of the dual formulation of $J(m) \equiv J(P, \beta, m) \coloneqq \max\{ J_+(P, \beta, m), J_-(P, \beta, m)\}$ for two randomly generated distributions supported on a finite alphabet of size $4$. The dashed curves show $J(m)$ computed by solving the dual derived in \ref{['theorem:dual-finite']}, and the solid curves represent $J(m)$ computed by directly solving the primal problem (Definition \ref{['def:complexity-function']}). The other two figures compare the widths (over $200$ trials) of our proposed CI (gray histogram), Hoeffding, Bernstein, and Empirical Bernstein CI bardenet2015concentration, on a problem with $|\Sigma|=10$, $N=1000$, $\beta=0.35$, and $\alpha \in \{10^{-5}, 10^{-10}\}$. As discussed in Remark \ref{['remark:empirical-and-population-inverse-information-projections']}, in the regime of small enough $\alpha$, our CI of Definition \ref{['def:CI-finite']} performs better than the other three methods.
Figure 4: (Left) We plot the ratio of the width of the CI of schneider2016probability, denoted by $\epsilon_{n, \text{Sch}}$, and the CI obtained in \ref{['theorem:banach-space-CI']}, denoted by $\epsilon_n$ as the sample size $n$ varies from $1$ to $N$. We fix $\alpha=0.05$ and consider five values of $N \in \{200, 1000, 2000, 10000, 20000\}$. In all instances, the CI derived in \ref{['theorem:banach-space-CI']} uniformly dominates that of schneider2016probability, and the improvement is more pronounced for smaller values of $N$. (Right) The solid black line is the average value of the $\|\mu_N-\mu_n\|_k$ over $100$ independent trials. Here $\mu_N$ and $\mu_n$ denote the empirical kernel mean embeddings defined using $N=1000$ MNIST images and a WoR sample of size $n\leq N$ respectively using the Matérn-$3/2$ kernel. As the plot indicates, the CIs of \ref{['theorem:banach-space-CI']} are tighter than the CI of schneider2016probability shown by the dashed curve in the entire range of $n \in \{1, \ldots, N\}$.

Theorems & Definitions (38)

Definition 3.1
Remark 3.2
Definition 3.3: Moderate Accuracy Regime
Theorem 3.4: Width lower bound
Remark 3.5
Definition 3.6
Theorem 3.7
Remark 3.8
Theorem 3.9
Remark 3.10
...and 28 more

Tighter Confidence Intervals under Without Replacement Sampling via Empirical Rate Functions

Abstract

Tighter Confidence Intervals under Without Replacement Sampling via Empirical Rate Functions

Authors

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (38)