Table of Contents
Fetching ...

Recursive Neyman Algorithm for Optimum Sample Allocation under Box Constraints on Sample Sizes in Strata

Jacek Wesołowski, Robert Wieczorkowski, Wojciech Wójciak

Abstract

The optimum sample allocation in stratified sampling is one of the basic issues of survey methodology. It is a procedure of dividing the overall sample size into strata sample sizes in such a way that for given sampling designs in strata the variance of the stratified $π$ estimator of the population total (or mean) for a given study variable assumes its minimum. In this work, we consider the optimum allocation of a sample, under lower and upper bounds imposed jointly on sample sizes in strata. We are concerned with the variance function of some generic form that, in particular, covers the case of the simple random sampling without replacement in strata. The goal of this paper is twofold. First, we establish (using the Karush-Kuhn-Tucker conditions) a generic form of the optimal solution, the so-called optimality conditions. Second, based on the established optimality conditions, we derive an efficient recursive algorithm, named RNABOX, which solves the allocation problem under study. The RNABOX can be viewed as a generalization of the classical recursive Neyman allocation algorithm, a popular tool for optimum allocation when only upper bounds are imposed on sample strata-sizes. We implement RNABOX in R as a part of our package stratallo which is available from the Comprehensive R Archive Network (CRAN) repository.

Recursive Neyman Algorithm for Optimum Sample Allocation under Box Constraints on Sample Sizes in Strata

Abstract

The optimum sample allocation in stratified sampling is one of the basic issues of survey methodology. It is a procedure of dividing the overall sample size into strata sample sizes in such a way that for given sampling designs in strata the variance of the stratified estimator of the population total (or mean) for a given study variable assumes its minimum. In this work, we consider the optimum allocation of a sample, under lower and upper bounds imposed jointly on sample sizes in strata. We are concerned with the variance function of some generic form that, in particular, covers the case of the simple random sampling without replacement in strata. The goal of this paper is twofold. First, we establish (using the Karush-Kuhn-Tucker conditions) a generic form of the optimal solution, the so-called optimality conditions. Second, based on the established optimality conditions, we derive an efficient recursive algorithm, named RNABOX, which solves the allocation problem under study. The RNABOX can be viewed as a generalization of the classical recursive Neyman allocation algorithm, a popular tool for optimum allocation when only upper bounds are imposed on sample strata-sizes. We implement RNABOX in R as a part of our package stratallo which is available from the Comprehensive R Archive Network (CRAN) repository.
Paper Structure (25 sections, 7 theorems, 56 equations, 4 figures, 6 tables, 4 algorithms)

This paper contains 25 sections, 7 theorems, 56 equations, 4 figures, 6 tables, 4 algorithms.

Key Result

Theorem 3.1

The optimization Problem prob has a unique optimal solution. Point $\mathbf x^* \in {\mathbb R}_+^{\lvert \mathcal{H} \rvert}$ is a solution to optimization Problem prob if and only if $\mathbf x^*= \mathbf x^{(\mathcal{L}^*,\, \mathcal{U}^*)}$, with disjoint $\mathcal{L}^*,\, \mathcal{U}^* \subsete

Figures (4)

  • Figure 4.1: Assignments of strata into set $\mathcal{L}$ ( take-min) and set $\mathcal{U}$ ( take-max) in RNABOX algorithm for an example of population as given in Table \ref{['tab:rnabox_example']} and total sample size $n = 5110$. The $\sigma(\mathcal{H})$ axis corresponds to strata assigned to set $\mathcal{U}$, while $\tau(\mathcal{H})$ is for $\mathcal{L}$. Squares represent assignments of strata to $\mathcal{L}$ ($\square$) or $\mathcal{U}$ ($\blacksquare$) such that the coordinate corresponding to a given square is the value of the last element (following the order of strata, $\sigma$ or $\tau$, associated to the respective axis) in the set.
  • Figure 5.1: Running times of FPIA and RNABOX for two artificial populations. Top graphs show the empirical median of execution times (calculated from 100 repetitions) for different total sample sizes. Numbers in brackets are the numbers of iterations of a given algorithm. In the case of RNABOX, it is a vector with number of iterations of the RNA (see Step \ref{['alg:rnabox:rna']} of RNABOX) for each iteration of RNABOX. Thus, the length of this vector is equal to the number of iterations of RNABOX. Counts of take-min, take-Neyman, and take-max strata are shown on bottom graphs.
  • Figure A.1: Functions $g$ and $\phi$ for Example \ref{['ex:fpia_blocked']} of the allocation problem for which the FPIA gets blocked.
  • Figure A.2: Functions $g$ and $\phi$ for Example \ref{['ex:fpia_blocked']} of the allocation problem for which the FPIA does not converge.

Theorems & Definitions (23)

  • Definition 1.1
  • Definition 3.1
  • Definition 3.2
  • Theorem 3.1: Optimality conditions
  • Remark 3.1
  • Theorem 4.1
  • Example A.1
  • Example A.2
  • Example A.3
  • Remark B.1
  • ...and 13 more