Table of Contents
Fetching ...

The German Tank Problem with Multiple Factories

Steven J. Miller, Kishan Sharma, Andrew K. Yang

TL;DR

This work extends the classical German Tank Problem to a setting with $l$ factories, unknown gaps $G_i$, and total production $N_{\text{tot}}=\sum_i N_i$, under sampling without replacement. It develops the GTP-UM estimator for unknown minimum, proves it is the MVUE with variance $\operatorname{Var}(\hat N_{\text{UM}})=\dfrac{2(N+1)(N-k)}{(k-1)(k+2)}$, and shows the original GTP is the MVUE in its setting. For the multi-factory problem, it analyzes the probability of missing a factory, reveals a threshold in the asymptotic regime for when samples cover all factories, and proposes a gap-informed approach that partitions samples to apply GTP to the first factory and GTP-UM to the rest; simulations demonstrate improved accuracy with sufficient samples and favorable gap structures. When restricting to equal factory sizes and known fixed gaps, the paper derives a simple unbiased estimator $\hat N=\frac{1}{l}(2M - G(l-1) - 1)$ (and a useful large-$N$ approximation), yielding substantial variance reductions even with modest sample sizes. Overall, the work provides principled MVUE results for the GTP variants and practical, robust strategies for estimating total production under complex multi-factory structures.

Abstract

During the Second World War, estimates of the number of tanks deployed by Germany were critically needed. The Allies adopted a successful statistical approach to estimate this information: assume that the tanks are sequentially numbered starting from, say, 1, and ending at an unknown positive integer $N$. If we observe the numbers of $k$ tanks, then the best linear unbiased estimator for $N$ is $M(1+1/k)-1$ where $M$ is the maximum observed serial number. While this approach was successful, there are many more adversarial situations where the approach for the original German Tank Problem falls short. Typically the number of ``factories'' is a possibly unknown $l>1$, and tanks produced by different factories may have serial numbers in disjoint ranges that are often separated by unknown amounts. Clark, Gonye and Miller (CGM) presented an unbiased estimator for $N$ when the minimum serial number is unknown. So if one can identify which samples correspond to which factory, one can then estimate each factory's range using CGM's method, and sum them for an estimate of the rival's total productivity. We present a procedure to estimate the total productivity and prove that it is effective when $\log l/\log k$ is sufficiently small. In the final section, we show that if we have a small number of samples, we can make an estimator that performs orders of magnitude better when given additional information about the size of the gaps.

The German Tank Problem with Multiple Factories

TL;DR

This work extends the classical German Tank Problem to a setting with factories, unknown gaps , and total production , under sampling without replacement. It develops the GTP-UM estimator for unknown minimum, proves it is the MVUE with variance , and shows the original GTP is the MVUE in its setting. For the multi-factory problem, it analyzes the probability of missing a factory, reveals a threshold in the asymptotic regime for when samples cover all factories, and proposes a gap-informed approach that partitions samples to apply GTP to the first factory and GTP-UM to the rest; simulations demonstrate improved accuracy with sufficient samples and favorable gap structures. When restricting to equal factory sizes and known fixed gaps, the paper derives a simple unbiased estimator (and a useful large- approximation), yielding substantial variance reductions even with modest sample sizes. Overall, the work provides principled MVUE results for the GTP variants and practical, robust strategies for estimating total production under complex multi-factory structures.

Abstract

During the Second World War, estimates of the number of tanks deployed by Germany were critically needed. The Allies adopted a successful statistical approach to estimate this information: assume that the tanks are sequentially numbered starting from, say, 1, and ending at an unknown positive integer . If we observe the numbers of tanks, then the best linear unbiased estimator for is where is the maximum observed serial number. While this approach was successful, there are many more adversarial situations where the approach for the original German Tank Problem falls short. Typically the number of ``factories'' is a possibly unknown , and tanks produced by different factories may have serial numbers in disjoint ranges that are often separated by unknown amounts. Clark, Gonye and Miller (CGM) presented an unbiased estimator for when the minimum serial number is unknown. So if one can identify which samples correspond to which factory, one can then estimate each factory's range using CGM's method, and sum them for an estimate of the rival's total productivity. We present a procedure to estimate the total productivity and prove that it is effective when is sufficiently small. In the final section, we show that if we have a small number of samples, we can make an estimator that performs orders of magnitude better when given additional information about the size of the gaps.
Paper Structure (16 sections, 7 theorems, 52 equations, 7 figures)

This paper contains 16 sections, 7 theorems, 52 equations, 7 figures.

Key Result

Theorem 2.1

We have for the German Tank Problem with Unknown Minimum that

Figures (7)

  • Figure 1: Statistics vs Intelligence estimates for German tank production.
  • Figure 2: Probability of missing at least one factory $P_{N,l,k}$ against the number of samples $k$ for different values of factory size $N$ and number of factories $l$. The zagged behaviour occurs due to the rounding of $l$ to integer values.
  • Figure 3: Probability of missing at least one factory $P_{N,l,k}$ against the number of samples $k$ for different values of the number of factories $l$ and $N\to\infty$.
  • Figure 4: Mean Squared Error of 10,000 estimations by the MFP plotted against sample size $k$.
  • Figure 5: Mean Squared Error of 10,000 estimations by the MFP plotted against sample size $k$.
  • ...and 2 more figures

Theorems & Definitions (21)

  • Theorem 2.1
  • Remark 2.2
  • proof
  • Theorem 2.3
  • proof
  • Example 3.1
  • Theorem 3.2
  • proof
  • Lemma 3.3
  • proof
  • ...and 11 more