The Labeled Coupon Collector Problem with Random Sample Sizes and Partial Recovery
Shoham Shimon Berrebi, Eitan Yaakobi, Zohar Yakhini, Daniella Bar-Lev
TL;DR
This work generalizes the classic Coupon Collector problem into the Labeled Coupon Collector framework, introducing the k-LCCP (fixed-size samples) and K-LCCP (random sample sizes) with partial and complete recovery objectives. It provides structural results for partial recovery in the k-LCCP, notably showing that in the 2-LCCP model the first recovered label implies two more and establishing a lower bound $\big\lceil\frac{2n}{3}\big\rceil$ on complete recovery, with explicit expressions for specific recoveries. For complete recovery in the K-LCCP, the authors develop a Markov-chain model with states $s=(\alpha,\beta,\gamma)$ and derive a closed form for $n=3$ while proving a general lower bound alignment $\mathrm{Min}_{K(p)\text{-LCCP}}(n)=\mathrm{Min}_{2\text{-LCCP}}(n)$ for $n\ge 3$ and conjecturing $\mathrm{Min}_{k\text{-LCCP}}(n)=\big\lceil\frac{2n}{k+1}\big\rceil$, which is proven in particular cases. The results connect labeled-edge recovery to Markovian sampling and provide insights into how heterogeneous sampling and partial information affect recovery complexity, with potential implications for data-collection and DNA storage contexts. The work also highlights nontrivial dependencies on the sampling distribution $K$ and motivates future extensions, including multi-label per coupon scenarios and broader CCP variants.
Abstract
We extend the Coupon Collector's Problem (CCP) and present a novel generalized model, referred as the k-LCCP problem, where one is interested in recovering a bipartite graph with a perfect matching, which represents the coupons and their matching labels. We show two extra-extensions to this variation: the heterogeneous sample size case (K-LCCP) and the partly recovering case.
