Table of Contents
Fetching ...

RLStop: A Reinforcement Learning Stopping Method for TAR

Reem Bin-Hezam, Mark Stevenson

TL;DR

RLStop addresses TAR stopping by learning a stopping policy from ranked batches via reinforcement learning, avoiding invalid statistical stopping assumptions. It models the ranking as a sequential decision problem and optimizes a reward that balances target recall attainment with minimal examination, using PPO to train a neural policy. Evaluations on six TAR benchmarks show RLStop often matches or nearly matches the oracle and yields substantial workload reductions compared with baselines. Practical limitations include the need for topic-specific training data and potentially training separate models for each target recall level.

Abstract

We present RLStop, a novel Technology Assisted Review (TAR) stopping rule based on reinforcement learning that helps minimise the number of documents that need to be manually reviewed within TAR applications. RLStop is trained on example rankings using a reward function to identify the optimal point to stop examining documents. Experiments at a range of target recall levels on multiple benchmark datasets (CLEF e-Health, TREC Total Recall, and Reuters RCV1) demonstrated that RLStop substantially reduces the workload required to screen a document collection for relevance. RLStop outperforms a wide range of alternative approaches, achieving performance close to the maximum possible for the task under some circumstances.

RLStop: A Reinforcement Learning Stopping Method for TAR

TL;DR

RLStop addresses TAR stopping by learning a stopping policy from ranked batches via reinforcement learning, avoiding invalid statistical stopping assumptions. It models the ranking as a sequential decision problem and optimizes a reward that balances target recall attainment with minimal examination, using PPO to train a neural policy. Evaluations on six TAR benchmarks show RLStop often matches or nearly matches the oracle and yields substantial workload reductions compared with baselines. Practical limitations include the need for topic-specific training data and potentially training separate models for each target recall level.

Abstract

We present RLStop, a novel Technology Assisted Review (TAR) stopping rule based on reinforcement learning that helps minimise the number of documents that need to be manually reviewed within TAR applications. RLStop is trained on example rankings using a reward function to identify the optimal point to stop examining documents. Experiments at a range of target recall levels on multiple benchmark datasets (CLEF e-Health, TREC Total Recall, and Reuters RCV1) demonstrated that RLStop substantially reduces the workload required to screen a document collection for relevance. RLStop outperforms a wide range of alternative approaches, achieving performance close to the maximum possible for the task under some circumstances.
Paper Structure (10 sections, 2 equations, 2 figures)

This paper contains 10 sections, 2 equations, 2 figures.

Figures (2)

  • Figure 1: Performance of RLStop and baselines for Recall vs. Cost metrics. Grey lines indicates non-oracle Pareto optimal approaches. (Note differences in range of y-axis (recall) to avoid clustering of results.)
  • Figure 2: Distribution of RLStop excess across topics. (Outliers for target recall 1.0 removed for clarity.)