Using Chao's Estimator as a Stopping Criterion for Technology-Assisted Review

Michiel P. Bron; Peter G. M. van der Heijden; Ad J. Feelders; Arno P. J. M. Siebes

Using Chao's Estimator as a Stopping Criterion for Technology-Assisted Review

Michiel P. Bron, Peter G. M. van der Heijden, Ad J. Feelders, Arno P. J. M. Siebes

TL;DR

This paper introduces a stopping criterion for Technology-Assisted Review based on Population Size Estimation using Chao's estimator to bound the total number of relevant documents $| ext{D}^+|$. It integrates two versions of Chao's estimator (Chao 1987 and Chao Rivest) within an ensemble Active Learning TAR framework that allows single- and multi-user document ranking, sampling, and decision-making. Through extensive simulations on diverse TAR datasets, the authors compare estimator-based stopping with existing criteria, showing that the Rivest variant often yields superior recall and work savings, while the conservative Chao 1987 approach provides robust reliability. The work demonstrates that PSE-based stopping can offer formal stopping guarantees and informative recall estimates, potentially improving decision support for reviewers in large-scale literature searches. Practical impact includes more reliable stopping decisions in systematic reviews and related text screening tasks, with clear trade-offs between recall guarantee and reader workload.

Abstract

Technology-Assisted Review (TAR) aims to reduce the human effort required for screening processes such as abstract screening for systematic literature reviews. Human reviewers label documents as relevant or irrelevant during this process, while the system incrementally updates a prediction model based on the reviewers' previous decisions. After each model update, the system proposes new documents it deems relevant, to prioritize relevant documentsover irrelevant ones. A stopping criterion is necessary to guide users in stopping the review process to minimize the number of missed relevant documents and the number of read irrelevant documents. In this paper, we propose and evaluate a new ensemble-based Active Learning strategy and a stopping criterion based on Chao's Population Size Estimator that estimates the prevalence of relevant documents in the dataset. Our simulation study demonstrates that this criterion performs well on several datasets and is compared to other methods presented in the literature.

Using Chao's Estimator as a Stopping Criterion for Technology-Assisted Review

TL;DR

This paper introduces a stopping criterion for Technology-Assisted Review based on Population Size Estimation using Chao's estimator to bound the total number of relevant documents

. It integrates two versions of Chao's estimator (Chao 1987 and Chao Rivest) within an ensemble Active Learning TAR framework that allows single- and multi-user document ranking, sampling, and decision-making. Through extensive simulations on diverse TAR datasets, the authors compare estimator-based stopping with existing criteria, showing that the Rivest variant often yields superior recall and work savings, while the conservative Chao 1987 approach provides robust reliability. The work demonstrates that PSE-based stopping can offer formal stopping guarantees and informative recall estimates, potentially improving decision support for reviewers in large-scale literature searches. Practical impact includes more reliable stopping decisions in systematic reviews and related text screening tasks, with clear trade-offs between recall guarantee and reader workload.

Abstract

Paper Structure (59 sections, 20 equations, 35 figures, 10 tables, 1 algorithm)

This paper contains 59 sections, 20 equations, 35 figures, 10 tables, 1 algorithm.

Introduction
Related Work
Stopping Criteria
Pragmatic Criteria (Standoff & Heuristics)
Baseline Inclusion Rate (Hybrid & Heuristic)
Target method (Hybrid & Heuristic)
Knee method (Standoff & Heuristic)
Budget method (Standoff & Heuristic)
AutoStop (Interventional & Certification)
Quant (CI) Rule (Standoff & Certification)
Hypergeometric method (Hybrid, Standoff & Certification)
Methodology
Population Size Estimation for Technology-Assisted Review
PSE for Search Tasks
PSE without multiple reviewers
...and 44 more sections

Figures (35)

Figure 1: This figure shows an architectural overview of our method. The Active Learning module consists of several committee members $\{\mathcal{C}_1, \dots, \mathcal{C}_n \}$, with each its own labeled and unlabeled state. Each of the members can have a Machine Learning Model (for illustrative purposes represented as an Artificial Neural Network). The rankings of each of the members are combined by going through each member in a round-robin or random fashion and selecting the top of the stack. The estimation module can query the labeled states of each of the member to construct a contingency table and fit a PSE model.
Figure 2: An example run for 500 iterations on a dataset. The Ensemble curve shows the number of documents that have been found by the overall system. The other curves display the number of relevant documents that have been found by the individual members within $\mathcal{C}$. The reader may notice that curves start slightly after 0 documents and end slightly after 500 documents. This is caused by the fact that our method requires five relevant and five irrelevant documents at the start of the process (see Section \ref{['sec-training-ranking-sampling']}), which results in this shift.
Figure 3: Calculating the 95 % confidence interval using the profile likelihood method for the frequency statistics in Table \ref{['tbl-fstats']}. In this figure, the log-likelihood for $u^\star$ is subtracted from the likelihood in aid of the visualization. Additionally, we inverted the $y$-axis for this purpose. By finding the values $u$ that intersect with the line $y = k_{\alpha=0.05} = 3.84$, we can find the lower bound and upper bound of the interval.
Figure 4: Chao -- CLEF2017-CD011548 dataset (ours)
Figure 5: Chao -- Van Dis dataset (ours)
...and 30 more figures

Using Chao's Estimator as a Stopping Criterion for Technology-Assisted Review

TL;DR

Abstract

Using Chao's Estimator as a Stopping Criterion for Technology-Assisted Review

Authors

TL;DR

Abstract

Table of Contents

Figures (35)