Table of Contents
Fetching ...

Cardinalities in proofreading

Martin Klazar

TL;DR

The paper addresses how to rigorously justify Polya's heuristic $e \approx \frac{ab}{c}$ for estimating total errors from two proofreaders by formalizing proofreading as $(p,n)$-proofreaders and two independent $(p,q,n)$-proofreaders. It leverages the probabilistic method to derive a large-deviation bound: for a text with $m$ errors and a tolerance $a$, the fraction of runs for which $\left|\frac{\|P(s_v,t)\| \cdot \|P(u_w,t)\|}{\|P(s_v,P(u_w,t))\|} - m\right| \le a$ is at least $1 - 6\exp\left(-\frac{2 c(p,q)^2 a^2}{m}\right)$, with $c(p,q)=\tfrac{1}{2}\big(\tfrac{1}{p}+\tfrac{1}{q}+\tfrac{2}{pq}\big)^{-1}$. This result, together with corollaries and omega-based refinements, provides a rigorous justification and quantification of Polya's estimate under explicit modeling assumptions, while clarifying the roles of realistic vs probabilistic models in such probabilistic analyses.

Abstract

In 1975, G. Pólya suggested that if two proofreaders found $a$ and $b$ errors in a text, of which $c$ errors were found by both of them, then a reasonable approximation of the unknown number $e$ of all errors is $e\approx ab/c$. We justify this formula by constructing a realistic model of proofreaders and estimating the efficiency of this model with the help of the bound on large deviations in the Probabilistic Method. In conclusion we discuss the distinction between realistic and probabilistic models of problems.

Cardinalities in proofreading

TL;DR

The paper addresses how to rigorously justify Polya's heuristic for estimating total errors from two proofreaders by formalizing proofreading as -proofreaders and two independent -proofreaders. It leverages the probabilistic method to derive a large-deviation bound: for a text with errors and a tolerance , the fraction of runs for which is at least , with . This result, together with corollaries and omega-based refinements, provides a rigorous justification and quantification of Polya's estimate under explicit modeling assumptions, while clarifying the roles of realistic vs probabilistic models in such probabilistic analyses.

Abstract

In 1975, G. Pólya suggested that if two proofreaders found and errors in a text, of which errors were found by both of them, then a reasonable approximation of the unknown number of all errors is . We justify this formula by constructing a realistic model of proofreaders and estimating the efficiency of this model with the help of the bound on large deviations in the Probabilistic Method. In conclusion we discuss the distinction between realistic and probabilistic models of problems.

Paper Structure

This paper contains 3 sections, 8 theorems, 45 equations.

Key Result

Proposition 1.2

Let $\overline{s}=(\overline{s}_1,\overline{s}_2,\dots,\overline{s}_N)$ be the runs of the $(p,n)$-proofreader. Then for every set $I\subset[n]$ we have

Theorems & Definitions (10)

  • Definition 1.1: $(p,n)$-proofreaders
  • Proposition 1.2
  • Definition 1.3: two $(p,q,n)$-proofreaders
  • Proposition 1.4
  • Theorem 1.5
  • Corollary 1.6
  • Corollary 1.7
  • Proposition 2.1
  • Theorem 2.2
  • Theorem 2.3: Corollary A.1.7 in PM