Estimating Agreement by Chance for Sequence Annotation

Diya Li; Carolyn Rosé; Ao Yuan; Chunxiao Zhou

Estimating Agreement by Chance for Sequence Annotation

Diya Li, Carolyn Rosé, Ao Yuan, Chunxiao Zhou

TL;DR

A novel model for generating random annotations is introduced, which serves as the foundation for estimating chance agreement in sequence annotation tasks and is successfully derived through a combination simulation and corpus-based evaluation.

Abstract

In the field of natural language processing, correction of performance assessment for chance agreement plays a crucial role in evaluating the reliability of annotations. However, there is a notable dearth of research focusing on chance correction for assessing the reliability of sequence annotation tasks, despite their widespread prevalence in the field. To address this gap, this paper introduces a novel model for generating random annotations, which serves as the foundation for estimating chance agreement in sequence annotation tasks. Utilizing the proposed randomization model and a related comparison approach, we successfully derive the analytical form of the distribution, enabling the computation of the probable location of each annotated text segment and subsequent chance agreement estimation. Through a combination simulation and corpus-based evaluation, we successfully assess its applicability and validate its accuracy and efficacy.

Estimating Agreement by Chance for Sequence Annotation

TL;DR

Abstract

Paper Structure (9 sections, 2 equations, 2 figures, 10 tables)

This paper contains 9 sections, 2 equations, 2 figures, 10 tables.

Introduction
Theoretical Foundation and Motivation
Method
Experiments
Conclusion and Discussion
Limitations
Ethics Statement
Acknowledgements
Appendix

Figures (2)

Figure 1: The probability distributions for all possible locations of each random segment in a length=100 sequence annotated with four segments. The lengths of the four segments are 1, 5, 10, 15, from left to right.
Figure 2: Convert the case of $k=r+1$ to the case of $k=r$ by merging two adjacent text segments $\psi_i$ and $\psi_j$, the blue box represents the segment $\psi_i$ , and the red box represents the adjacent segment $\psi_j$.

Estimating Agreement by Chance for Sequence Annotation

TL;DR

Abstract

Estimating Agreement by Chance for Sequence Annotation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)