Noise Correction on Subjective Datasets

Uthman Jinadu; Yi Ding

Noise Correction on Subjective Datasets

Uthman Jinadu, Yi Ding

TL;DR

This work proposes to learn a more accurate representation of diverse opinions by utilizing multitask learning in conjunction with loss-based label correction by utilizing multitask learning in conjunction with loss-based label correction.

Abstract

Incorporating every annotator's perspective is crucial for unbiased data modeling. Annotator fatigue and changing opinions over time can distort dataset annotations. To combat this, we propose to learn a more accurate representation of diverse opinions by utilizing multitask learning in conjunction with loss-based label correction. We show that using our novel formulation, we can cleanly separate agreeing and disagreeing annotations. Furthermore, this method provides a controllable way to encourage or discourage disagreement. We demonstrate that this modification can improve prediction performance in a single or multi-annotator setting. Lastly, we show that this method remains robust to additional label noise that is applied to subjective data.