Dixie cup problem in an interlacing process
Aristides V. Doumas
TL;DR
This paper studies a two-component Dixie cup coupon collector where the coupon types are formed by interlacing two distributions, one of which is rarer. Using Poissonization and Euler–Maclaurin methods, it derives the leading-term asymptotics of $E\left[T_{m}(N;\alpha)\right]$ as $N\to\infty$ under broad growth conditions on the two sequences, showing that both distributions contribute to the limit via a key integral expression $E\left[T_{m}(N;\alpha)\right]\sim (D_M+B_M)\int_{0}^{\infty}\left[1-\prod_{j=1}^{M}\left(1-S_m(d_ju)e^{-d_ju}\right)\right]du$, with $D_M=\sum_{j=1}^{M}d_j$ and $B_M=\sum_{j=1}^{M}b_j$. The work extends to more subfamilies and discusses rising moments, providing concrete examples (e.g., Zipf vs exponential) and a table of leading terms, complemented by simulations. The findings illuminate how heterogeneity in coupon probabilities shapes the expected time to collect $m$ complete sets and generalize classic coupon-collector results to interlaced distributions. The approach and results have potential applications in areas where heterogeneous rare events govern discovery processes.
Abstract
The double Dixie cup problem of D.J. Newman and L. Shepp is a well-known variant of the coupon collector problem, where the object of study is the number of coupons that a collector has to buy in order to complete m sets of all N existing different coupons. In this paper we consider the case where the coupons distribution is a mixture of two different distributions, where the coupons from the first distribution are far rarer than the ones coming from the second. We apply a Poissonization technique, as well as well known results and techniques from our previous work, to derive the asymptotics (leading term) of the expectation of the above random variable as N goes to infinity for large classes of distributions. As it turns out, both distributions contribute to this result. The leading asymptotics of the rising moments of the aforementioned random variable are also discussed. We conclude by generalizing the problem to the case where the family of coupons is a mixture of j subfamilies.
