Cardinalities in proofreading

Martin Klazar

Cardinalities in proofreading

Martin Klazar

TL;DR

The paper addresses how to rigorously justify Polya's heuristic $e \approx \frac{ab}{c}$ for estimating total errors from two proofreaders by formalizing proofreading as $(p,n)$-proofreaders and two independent $(p,q,n)$-proofreaders. It leverages the probabilistic method to derive a large-deviation bound: for a text with $m$ errors and a tolerance $a$, the fraction of runs for which $\left|\frac{\|P(s_v,t)\| \cdot \|P(u_w,t)\|}{\|P(s_v,P(u_w,t))\|} - m\right| \le a$ is at least $1 - 6\exp\left(-\frac{2 c(p,q)^2 a^2}{m}\right)$, with $c(p,q)=\tfrac{1}{2}\big(\tfrac{1}{p}+\tfrac{1}{q}+\tfrac{2}{pq}\big)^{-1}$. This result, together with corollaries and omega-based refinements, provides a rigorous justification and quantification of Polya's estimate under explicit modeling assumptions, while clarifying the roles of realistic vs probabilistic models in such probabilistic analyses.

Abstract

In 1975, G. Pólya suggested that if two proofreaders found $a$ and $b$ errors in a text, of which $c$ errors were found by both of them, then a reasonable approximation of the unknown number $e$ of all errors is $e\approx ab/c$. We justify this formula by constructing a realistic model of proofreaders and estimating the efficiency of this model with the help of the bound on large deviations in the Probabilistic Method. In conclusion we discuss the distinction between realistic and probabilistic models of problems.

Cardinalities in proofreading

TL;DR

The paper addresses how to rigorously justify Polya's heuristic

for estimating total errors from two proofreaders by formalizing proofreading as

-proofreaders and two independent

-proofreaders. It leverages the probabilistic method to derive a large-deviation bound: for a text with

errors and a tolerance

, the fraction of runs for which

is at least

, with

. This result, together with corollaries and omega-based refinements, provides a rigorous justification and quantification of Polya's estimate under explicit modeling assumptions, while clarifying the roles of realistic vs probabilistic models in such probabilistic analyses.

Abstract

In 1975, G. Pólya suggested that if two proofreaders found

and

errors in a text, of which

errors were found by both of them, then a reasonable approximation of the unknown number

of all errors is

. We justify this formula by constructing a realistic model of proofreaders and estimating the efficiency of this model with the help of the bound on large deviations in the Probabilistic Method. In conclusion we discuss the distinction between realistic and probabilistic models of problems.

Cardinalities in proofreading

TL;DR

Abstract

Cardinalities in proofreading

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (10)