RLStop: A Reinforcement Learning Stopping Method for TAR
Reem Bin-Hezam, Mark Stevenson
TL;DR
RLStop addresses TAR stopping by learning a stopping policy from ranked batches via reinforcement learning, avoiding invalid statistical stopping assumptions. It models the ranking as a sequential decision problem and optimizes a reward that balances target recall attainment with minimal examination, using PPO to train a neural policy. Evaluations on six TAR benchmarks show RLStop often matches or nearly matches the oracle and yields substantial workload reductions compared with baselines. Practical limitations include the need for topic-specific training data and potentially training separate models for each target recall level.
Abstract
We present RLStop, a novel Technology Assisted Review (TAR) stopping rule based on reinforcement learning that helps minimise the number of documents that need to be manually reviewed within TAR applications. RLStop is trained on example rankings using a reward function to identify the optimal point to stop examining documents. Experiments at a range of target recall levels on multiple benchmark datasets (CLEF e-Health, TREC Total Recall, and Reuters RCV1) demonstrated that RLStop substantially reduces the workload required to screen a document collection for relevance. RLStop outperforms a wide range of alternative approaches, achieving performance close to the maximum possible for the task under some circumstances.
