Estimation of relative risk, odds ratio and their logarithms with guaranteed accuracy and controlled sample size ratio
Luis Mendo
TL;DR
The paper tackles estimating relative risk $\theta$, odds ratio $\psi$, and their logarithms from two populations with binary outcomes. It introduces two-stage inverse binomial sampling (IBS) procedures that deliver unbiased estimators with guaranteed accuracy (relative MSE for RR/OR, MSE for LRR/LOR) and provide control over the average sample-size ratio across populations, applicable to both element and group sampling. A key novelty is the use of an initial pilot stage to adapt the second-stage sample sizes $s_1,s_2$ via a curvature-based design, ensuring the target error $A$ is met uniformly over $p_1,p_2$; efficiency relative to Cramér–Rao bounds is shown to be close to 1 for small targets. The framework also extends to group sampling, offering explicit approximations for the average number of groups and the associated efficiency, with simulations validating the theoretical bounds. The methods rely on a Bernoulli factory to generate success/failure patterns with transformed parameters where needed, enabling flexible estimation of a broad class of functions of $p_1$ and $p_2$ beyond RR and OR.
Abstract
Given two populations from which independent binary observations are taken with parameters $p_1$ and $p_2$ respectively, estimators are proposed for the relative risk $p_1/p_2$, the odds ratio $p_1(1-p_2)/(p_2(1-p_1))$ and their logarithms. The sampling strategy used by the estimators is based on two-stage sequential sampling applied to each population, where the sample sizes of the second stage are computed from the results observed in the first stage. The estimators guarantee that the relative mean-square error, or the mean-square error for the logarithmic versions, is less than a target value for any $p_1, p_2 \in (0,1)$, and the ratio of average sample sizes from the two populations is close to a prescribed value. The estimators can also be used with group sampling, whereby samples are taken in batches of fixed size from the two populations simultaneously, each batch containing samples from the two populations. The efficiency of the estimators with respect to the Cramér-Rao bound is good, and in particular it is close to $1$ for small values of the target error.
