Table of Contents
Fetching ...

Time Is Effort: Estimating Human Post-Editing Time for Grammar Error Correction Tool Evaluation

Ankit Vadehra, Bill Johnson, Gene Saunders, Pascal Poupart

TL;DR

Time Is Effort introduces a human-centered approach to evaluating Grammar Error Correction (GEC) tools by measuring post-editing time-to-correct (PEET). It provides a large-scale dataset of PE time annotations for BEA19 and CoNLL14, using two strong GEC tools (GECToR and GEC-PD) and professional editors, and proposes the PEET Scorer, a regression-based model that predicts time-to-correct from ERRANT-derived edit features. The results show that edits involving paraphrasing and punctuation contribute most to PE time, and that GEC tool first-pass outputs can reduce editing time by about 4 seconds per sentence while improving final correction quality. The PEET Scorer demonstrates meaningful alignment with human judgment rankings, offering a human-centric metric for GEC usability; future work includes broader language coverage and deeper investigation of cognitive effort in post-editing.

Abstract

Text editing can involve several iterations of revision. Incorporating an efficient Grammar Error Correction (GEC) tool in the initial correction round can significantly impact further human editing effort and final text quality. This raises an interesting question to quantify GEC Tool usability: How much effort can the GEC Tool save users? We present the first large-scale dataset of post-editing (PE) time annotations and corrections for two English GEC test datasets (BEA19 and CoNLL14). We introduce Post-Editing Effort in Time (PEET) for GEC Tools as a human-focused evaluation scorer to rank any GEC Tool by estimating PE time-to-correct. Using our dataset, we quantify the amount of time saved by GEC Tools in text editing. Analyzing the edit type indicated that determining whether a sentence needs correction and edits like paraphrasing and punctuation changes had the greatest impact on PE time. Finally, comparison with human rankings shows that PEET correlates well with technical effort judgment, providing a new human-centric direction for evaluating GEC tool usability. We release our dataset and code at: https://github.com/ankitvad/PEET_Scorer.

Time Is Effort: Estimating Human Post-Editing Time for Grammar Error Correction Tool Evaluation

TL;DR

Time Is Effort introduces a human-centered approach to evaluating Grammar Error Correction (GEC) tools by measuring post-editing time-to-correct (PEET). It provides a large-scale dataset of PE time annotations for BEA19 and CoNLL14, using two strong GEC tools (GECToR and GEC-PD) and professional editors, and proposes the PEET Scorer, a regression-based model that predicts time-to-correct from ERRANT-derived edit features. The results show that edits involving paraphrasing and punctuation contribute most to PE time, and that GEC tool first-pass outputs can reduce editing time by about 4 seconds per sentence while improving final correction quality. The PEET Scorer demonstrates meaningful alignment with human judgment rankings, offering a human-centric metric for GEC usability; future work includes broader language coverage and deeper investigation of cognitive effort in post-editing.

Abstract

Text editing can involve several iterations of revision. Incorporating an efficient Grammar Error Correction (GEC) tool in the initial correction round can significantly impact further human editing effort and final text quality. This raises an interesting question to quantify GEC Tool usability: How much effort can the GEC Tool save users? We present the first large-scale dataset of post-editing (PE) time annotations and corrections for two English GEC test datasets (BEA19 and CoNLL14). We introduce Post-Editing Effort in Time (PEET) for GEC Tools as a human-focused evaluation scorer to rank any GEC Tool by estimating PE time-to-correct. Using our dataset, we quantify the amount of time saved by GEC Tools in text editing. Analyzing the edit type indicated that determining whether a sentence needs correction and edits like paraphrasing and punctuation changes had the greatest impact on PE time. Finally, comparison with human rankings shows that PEET correlates well with technical effort judgment, providing a new human-centric direction for evaluating GEC tool usability. We release our dataset and code at: https://github.com/ankitvad/PEET_Scorer.

Paper Structure

This paper contains 30 sections, 4 figures, 20 tables.

Figures (4)

  • Figure 1: ERRANT edit category and types.
  • Figure 2: Sentence correction edits extracted using the ERRANT toolkit.
  • Figure 3: Survey instructions for the editor to perform post editing, and obtain target corrections for our dataset.
  • Figure 4: Example source sentence and its first-pass edit from the Survey. The editor can make further improvements in the text box. Submitting the final target correction.