Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models

Lynnette Hui Xian Ng; Kokil Jaidka; Kaiyuan Tay; Hansin Ahuja; Niyati Chhaya

Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models

Lynnette Hui Xian Ng, Kokil Jaidka, Kaiyuan Tay, Hansin Ahuja, Niyati Chhaya

TL;DR

The paper tackles the challenge of predicting user behavior from conversational text when crowdsourced labels are noisy. It proposes MSWEEM, a metadata-sensitive ensemble that uses annotator meta-features such as Throughput and Worktime to weight auxiliary label encodings before predicting the target variable. Empirical results show a 14% improvement on held-out Diplomacy data and about 12% on OffMyChest, with meta-features significantly enhancing performance across datasets and annotator cohorts. The work demonstrates the practical value of incorporating annotator behavior signals into NLP workflows, offering actionable guidance for crowdsourcing designs and robust modeling under label-quality uncertainty. It also provides insights into how different annotator cohorts (e.g., Master-qualified workers) contribute to data quality and model performance, informing quality control and data collection strategies.

Abstract

Supervised machine-learning models often underperform in predicting user behaviors from conversational text, hindered by poor crowdsourced label quality and low NLP task accuracy. We introduce the Metadata-Sensitive Weighted-Encoding Ensemble Model (MSWEEM), which integrates annotator meta-features like fatigue and speeding. First, our results show MSWEEM outperforms standard ensembles by 14% on held-out data and 12% on an alternative dataset. Second, we find that incorporating signals of annotator behavior, such as speed and fatigue, significantly boosts model performance. Third, we find that annotators with higher qualifications, such as Master's, deliver more consistent and faster annotations. Given the increasing uncertainty over annotation quality, our experiments show that understanding annotator patterns is crucial for enhancing model accuracy in user behavior prediction.

Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models

TL;DR

Abstract

Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)