Table of Contents
Fetching ...

The Labeled Coupon Collector Problem with Random Sample Sizes and Partial Recovery

Shoham Shimon Berrebi, Eitan Yaakobi, Zohar Yakhini, Daniella Bar-Lev

TL;DR

This work generalizes the classic Coupon Collector problem into the Labeled Coupon Collector framework, introducing the k-LCCP (fixed-size samples) and K-LCCP (random sample sizes) with partial and complete recovery objectives. It provides structural results for partial recovery in the k-LCCP, notably showing that in the 2-LCCP model the first recovered label implies two more and establishing a lower bound $\big\lceil\frac{2n}{3}\big\rceil$ on complete recovery, with explicit expressions for specific recoveries. For complete recovery in the K-LCCP, the authors develop a Markov-chain model with states $s=(\alpha,\beta,\gamma)$ and derive a closed form for $n=3$ while proving a general lower bound alignment $\mathrm{Min}_{K(p)\text{-LCCP}}(n)=\mathrm{Min}_{2\text{-LCCP}}(n)$ for $n\ge 3$ and conjecturing $\mathrm{Min}_{k\text{-LCCP}}(n)=\big\lceil\frac{2n}{k+1}\big\rceil$, which is proven in particular cases. The results connect labeled-edge recovery to Markovian sampling and provide insights into how heterogeneous sampling and partial information affect recovery complexity, with potential implications for data-collection and DNA storage contexts. The work also highlights nontrivial dependencies on the sampling distribution $K$ and motivates future extensions, including multi-label per coupon scenarios and broader CCP variants.

Abstract

We extend the Coupon Collector's Problem (CCP) and present a novel generalized model, referred as the k-LCCP problem, where one is interested in recovering a bipartite graph with a perfect matching, which represents the coupons and their matching labels. We show two extra-extensions to this variation: the heterogeneous sample size case (K-LCCP) and the partly recovering case.

The Labeled Coupon Collector Problem with Random Sample Sizes and Partial Recovery

TL;DR

This work generalizes the classic Coupon Collector problem into the Labeled Coupon Collector framework, introducing the k-LCCP (fixed-size samples) and K-LCCP (random sample sizes) with partial and complete recovery objectives. It provides structural results for partial recovery in the k-LCCP, notably showing that in the 2-LCCP model the first recovered label implies two more and establishing a lower bound on complete recovery, with explicit expressions for specific recoveries. For complete recovery in the K-LCCP, the authors develop a Markov-chain model with states and derive a closed form for while proving a general lower bound alignment for and conjecturing , which is proven in particular cases. The results connect labeled-edge recovery to Markovian sampling and provide insights into how heterogeneous sampling and partial information affect recovery complexity, with potential implications for data-collection and DNA storage contexts. The work also highlights nontrivial dependencies on the sampling distribution and motivates future extensions, including multi-label per coupon scenarios and broader CCP variants.

Abstract

We extend the Coupon Collector's Problem (CCP) and present a novel generalized model, referred as the k-LCCP problem, where one is interested in recovering a bipartite graph with a perfect matching, which represents the coupons and their matching labels. We show two extra-extensions to this variation: the heterogeneous sample size case (K-LCCP) and the partly recovering case.

Paper Structure

This paper contains 6 sections, 8 theorems, 19 equations, 2 figures.

Key Result

Lemma 1

For any integers $0 < k < n$ it holds that $T_{k\text{-LCCP}\xspace}(n)\xspace$ and $T_{(n-k)\text{-LCCP}\xspace}(n)\xspace$ have the same distribution.

Figures (2)

  • Figure 1: CCP by collecting edges in a bipartite graph. The coupons ($C$) are matched to the labels ($L$) via edges, representing samples. This graph-based representation highlights the CCP as a problem of reconstructing all edges with minimal sampling.
  • Figure 2: The normalized value of $T_{{K(p)}\text{-LCCP}\xspace}(n)\xspace$ as a function of $p\in[0,1]$, for $n=2000$. For any $p \in \{0,0.1,\ldots,1\}$, the exact value of $\mathbb{E}[T_{{K(p)}\text{-LCCP}\xspace}(n)\xspace]$, calculated by the Markov approach (\ref{['transistion_matrix_markov_(1;2)-MECD']}) is presented by the green curve, while the dashed red line represents the convex combination. Additionally, we conducted simulations of $T_{{K(p)}\text{-LCCP}\xspace}(n)\xspace$ and present, as box plots, results for $1000$ instances per all relevant values of $p$.

Theorems & Definitions (21)

  • Definition 1
  • Definition 2
  • Example 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Claim 1
  • Claim 2
  • Theorem 1
  • ...and 11 more